Patents by Inventor Fuchun Peng

Fuchun Peng has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

QUERY IDENTIFICATION AND NORMALIZATION FOR WEB SEARCH

Publication number: 20100257150

Abstract: A computer-implemented method for processing user entered query data to improve results of a search of pages using a local search database, when searching the internet, is disclosed. The method includes receiving the user entered query data and parsing each word of the query data and segmenting words using a probabilistic dictionary to determine a likelihood that the word is for a particular name. And, associating the particular names with a name tag to create one or more tagged name terms. Then, normalizing each of the tagged name terms and the normalizing including boosting information if found in the local search database and determining proximity between selected ones of the tagged name terms. The method then generates an optimized search query that incorporates normalized terms and operators. The optimized search query being applied to the internet to enable search results to be produced and displayed to the user in response to the entered query data.

Type: Application

Filed: June 17, 2010

Publication date: October 7, 2010

Applicant: Yahoo!, Inc.

Inventors: Yumao Lu, Nawaaz Ahmed, Fuchun Peng, Marco Zagha
Abbreviation handling in web search

Patent number: 7809715

Abstract: A method for handling abbreviations in web queries includes building a dictionary of a plurality of possible word expansions for a plurality of potential abbreviations related to query terms received or anticipated to be received by a search engine; accepting a query including an abbreviation; expanding the abbreviation into one of the plurality of word expansions if a probability that the expansion is correct is above a threshold value, wherein the probability is determined by taking into consideration a context of the abbreviation within the query, wherein the context including at least anchor text; and sending the query with the expanded abbreviation to the search engine to generate a search results page related to the query.

Type: Grant

Filed: April 15, 2008

Date of Patent: October 5, 2010

Assignee: Yahoo! Inc.

Inventors: Xing Wei, Fuchun Peng, Benoit Dumoulin
Predictive stemming for web search with statistical machine translation models

Patent number: 7788276

Abstract: Techniques for determining when and how to transform words in a query to return the most relevant search results while minimizing computational overhead are provided. A dictionary is generated based upon words used in a specified number of previous most frequent search queries and comprises lists of transformations that may include variants based upon the stems of words, synonyms, and abbreviation expansions. When a query is received from a user, candidate queries are generated based upon replacing particular words in the query with a transformation of the particular words. Candidate queries are selected that have a high probability of returning relevant results by computing values of the query using language model scoring and translation scoring. The selected candidate queries and the original query are executed to return search results. The search results are displayed to the user with the words in the original query and the transformed words in bold.

Type: Grant

Filed: August 22, 2007

Date of Patent: August 31, 2010

Assignee: Yahoo! Inc.

Inventors: Fuchun Peng, Nawaaz Ahmed, Yumao Lu, Marco J. Zagha
SEARCH QUERY DISAMBIGUATION

Publication number: 20100205198

Abstract: Disclosed herein is a system and method of query disambiguation. At least one model is generated using training data, which model can be used to score, or rank, possible interpretations identified for a query, which can be used to select an interpretation from a number of possible interpretations. A selected interpretation can be used to process a web search request, e.g., to generate search results that relate to the selected query interpretation, rank or order the items in the search result based on relevance to the selected query interpretation, and/or identify a presentation to be used to display the search results based on the selected query interpretation.

Type: Application

Filed: February 6, 2009

Publication date: August 12, 2010

Inventors: Gilad Mishne, Raymond Stata, Fuchun Peng
Local query identification and normalization for web search

Patent number: 7769746

Abstract: Computer-implemented methods and systems for processing user entered query data to improve results of a search of pages using a local search database are provided, when searching the internet. The method includes receiving the user entered query data and parsing each word of the query data and examining each word to determine if the word is associated with one of a business name, a city name or a state name. The examining uses probabilistic dictionaries to determine a likelihood that the word is for a particular term or intent. The method further includes normalizing each of the tagged business terms. The normalizing includes boosting information if found in the local search database and determining proximity between selected ones of the tagged terms. Then, generating an optimized internal search query that incorporates constraints and ranking based on at least the boosting information and the determined proximity between the selected tagged terms.

Type: Grant

Filed: January 16, 2008

Date of Patent: August 3, 2010

Assignee: Yahoo! Inc.

Inventors: Yumao Lu, Nawaaz Ahmed, Fuchun Peng, Marco Zagha
SYSTEM AND METHOD FOR IMPROVED SEARCH RELEVANCE USING PROXIMITY BOOSTING

Publication number: 20100191758

Abstract: A system and method for improved search relevance using proximity boosting. A query for a web search is received from a user, via a network, wherein the query comprises a plurality of query tokens. One or more concepts are identified in the query wherein each of concepts comprises at least two query tokens. A relative concept strength is determined for each of the identified concepts. The query is then rewritten for submission to a search engine wherein for each of the one or more concepts, a syntax rule associated with the respective relative concept strength of the concept is applied to the query tokens comprising the concept such that the rewritten query represents the one or more concepts whereby the proximity of the one or more concepts in a search result returned by the search engine to the user in response to the rewritten query is boosted.

Type: Application

Filed: January 26, 2009

Publication date: July 29, 2010

Applicant: Yahoo! Inc.

Inventors: Fuchun Peng, Xing Wei, Yumao Lu, Xin Li, Donald Metzler, Hang Cui, Benoit Dumoulin
IDENTIFYING AND EXPANDING IMPLICITLY TEMPORALLY QUALIFIED QUERIES

Publication number: 20100131538

Abstract: Methods and apparatus are described for identifying implicitly temporally qualified queries, i.e., queries for which a time period is implied but not explicitly stated, and for expanding such queries to include one or more temporal references.

Type: Application

Filed: November 24, 2008

Publication date: May 27, 2010

Applicant: YAHOO! INC.

Inventors: Rosie Jones, Donald Metzler, Fuchun Peng
Techniques for navigational query identification

Patent number: 7693865

Abstract: To accurately classify a query as navigational, thousands of available features are explored, extracted from major commercial search engine results, user Web search click data, query log, and the whole Web's relational content. To obtain the most useful features for navigational query identification, a three level system is used which integrates feature generation, feature integration, and feature selection in a pipeline. Because feature selection plays a key role in classification methodologies, the best feature selection method is coupled with the best classification approach to achieve the best performance for identifying navigational queries. According to one embodiment, linear Support Vector Machine (SVM) is used to rank features and the top ranked features are fed into a Stochastic Gradient Boosting Tree (SGBT) classification method for identifying whether or not a particular query is a navigational query.

Type: Grant

Filed: August 30, 2006

Date of Patent: April 6, 2010

Assignee: Yahoo! Inc.

Inventors: Yumao Lu, Fuchun Peng, Xin Li, Nawaaz Ahmed
Query rewriting with spell correction suggestions using a generated set of query features

Patent number: 7630978

Abstract: Techniques for rewriting queries submitted to a query engine are provided. A query is submitted by a user and sent to a search mechanism. Based on the query, one or more query suggestions are generated. Features are generated based on the query and the query suggestions. Those features are input to a trained machine learning mechanism that generates a rewrite score. The rewrite score signifies a confidence score that indicates how confident the search mechanism is that the user intended to submit the original query. If the rewrite score is below a certain threshold, then the original query is rewritten to a second query. Results of executing the original query may be sent to the user along with a reference to the second query. Additionally or alternatively, results of executing the second query are sent to the user.

Type: Grant

Filed: December 14, 2006

Date of Patent: December 8, 2009

Assignee: Yahoo! Inc.

Inventors: Xin Li, Nawaaz Ahmed, Fuchun Peng, Yumao Lu
NORMALIZING QUERY WORDS IN WEB SEARCH

Publication number: 20090259643

Abstract: A method for normalizing query words in web search includes populating a dictionary with join and split candidates and corresponding joined and split words from an aggregate of query logs; determining a confidence score for join and split candidates, a highest confidence score for each being characterized in the dictionary as must-join and must-split, respectively; accepting queries with words amenable to being split or joined, or amenable to an addition or deletion of a hyphen or an apostrophe; generating, based on the accepted queries, split candidates obtained from the dictionary, and candidates of join, hyphen, or apostrophe algorithmically; and submitting to a search engine the generated possible candidates characterized as must-join or must-split in the dictionary, to improve search results returned in response to the queries; applying a language dictionary to generated candidates not characterized as must-split or must-join, to rank them, and submitting those highest-ranked to the search engine.

Type: Application

Filed: April 15, 2008

Publication date: October 15, 2009

Applicant: Yahoo! Inc.

Inventors: Fuchun Peng, George H. Mills, Benoit Dumoulin
ABBREVIATION HANDLING IN WEB SEARCH

Publication number: 20090259629

Abstract: A method for handling abbreviations in web queries includes building a dictionary of a plurality of possible word expansions for a plurality of potential abbreviations related to query terms received or anticipated to be received by a search engine; accepting a query including an abbreviation; expanding the abbreviation into one of the plurality of word expansions if a probability that the expansion is correct is above a threshold value, wherein the probability is determined by taking into consideration a context of the abbreviation within the query, wherein the context including at least anchor text; and sending the query with the expanded abbreviation to the search engine to generate a search results page related to the query.

Type: Application

Filed: April 15, 2008

Publication date: October 15, 2009

Applicant: Yahoo! Inc.

Inventors: Xing Wei, Fuchun Peng, Benoit Dumoulin
NAME VERIFICATION USING MACHINE LEARNING

Publication number: 20090248595

Abstract: Computer-enabled methods, apparatus, and computer-readable media are provided for verifying that a given network name, such as a URL, is an official, e.g., registered, approved, or otherwise officially recognized, network name that refers to or identifies a principal, such as a business. These techniques involve receiving a principal name and a given network name, receiving at least one feature attribute from at least one database of feature attributes, wherein the at least one feature attribute comprises a characteristic of the principal name or a characteristic of the network name, and invoking a logistic regression method to generate a probability, based upon the at least one feature attribute, that the given network name is an official network name for the principal name. The logistic regression method may include a gradient boosting tree model that generates the probability based upon the at least one feature attribute.

Type: Application

Filed: March 31, 2008

Publication date: October 1, 2009

Inventors: Yumao Lu, Nawaaz Ahmed, Fuchun Peng, Benoit Dumoulin
MULTI-TERM SEARCH RESULT WITH UNSUPERVISED QUERY SEGMENTATION METHOD AND APPARATUS

Publication number: 20090234836

Abstract: Generally, a method and apparatus provides for search results in response to a web search request having at least two search terms in the search request. The method and apparatus includes generating a plurality of term groupings of the search terms and determining a relevance factor for each of the term groupings. The method and apparatus further determines a set of the term groupings based on the relevance factors and therein conducts a web resource search using the set of term groupings, to thereby generate search results. The method and apparatus provides the search results to the requesting entity.

Type: Application

Filed: March 14, 2008

Publication date: September 17, 2009

Applicant: YAHOO! INC.

Inventors: Fuchun Peng, Yumao Lu, Nawaaz Ahmed, Bin Tan
LOCAL QUERY IDENTIFICATION AND NORMALIZATION FOR WEB SEARCH

Publication number: 20090182729

Abstract: Computer-implemented methods and systems for processing user entered query data to improve results of a search of pages using a local search database are provided, when searching the internet. The method includes receiving the user entered query data and parsing each word of the query data and examining each word to determine if the word is associated with one of a business name, a city name or a state name. The examining uses probabilistic dictionaries to determine a likelihood that the word is one of the business name, the city name or the state name. Then, associating the words that were determined to be: (i) the business name with a business name tag to create one or more tagged business terms; (ii) the city name with a city name tag to create one or more tagged city terms; and (iii) the state name with a state name tag to create one or more tagged state terms. The method further includes normalizing each of the tagged business terms, the tagged city terms and the tagged state terms.

Type: Application

Filed: January 16, 2008

Publication date: July 16, 2009

Applicant: Yahoo!, Inc.

Inventors: Yumao Lu, Nawaaz Ahmed, Fuchun Peng, Marco Zagha
Method and Apparatus for Performing Multi-Phase Ranking of Web Search Results by Re-Ranking Results Using Feature and Label Calibration

Publication number: 20090132515

Abstract: A method and apparatus for performing multi-phase ranking of web search results by re-ranking results using feature and label calibration are provided. According to one embodiment of the invention, a ranking function is trained by using machine learning techniques on a set of training samples to produce ranking scores. The ranking function is used to rank the set of training samples according to its ranking score, in order of its relevance to a particular query. Next, a re-ranking function is trained by the same training samples to re-rank the documents from the first ranking. The features and labels of the training samples are calibrated and normalized before they are reused to train the re-ranking function. By this method, training data and training features used in past trainings are leveraged to perform additional training of new functions, without requiring the use of additional training data or features.

Type: Application

Filed: November 19, 2007

Publication date: May 21, 2009

Inventors: Yumao Lu, Fuchun Peng, Xin Li, Nawaaz Ahmed
Predictive Stemming for Web Search with Statistical Machine Translation Models

Publication number: 20090055380

Abstract: Techniques for determining when and how to transform words in a query to return the most relevant search results while minimizing computational overhead are provided. A dictionary is generated based upon words used in a specified number of previous most frequent search queries and comprises lists of transformations that may include variants based upon the stems of words, synonyms, and abbreviation expansions. When a query is received from a user, candidate queries are generated based upon replacing particular words in the query with a transformation of the particular words. Candidate queries are selected that have a high probability of returning relevant results by computing values of the query using language model scoring and translation scoring. The selected candidate queries and the original query are executed to return search results. The search results are displayed to the user with the words in the original query and the transformed words in bold.

Type: Application

Filed: August 22, 2007

Publication date: February 26, 2009

Inventors: Fuchun PENG, Nawaaz Ahmed, Yumao Lu, Marco J. Zagha
Word pluralization handling in query for web search

Publication number: 20080189262

Abstract: Techniques for determining when and how to transform words in a query to its plural or non-plural form in order to provide the most relevant search results while minimizing computational overhead are provided. A dictionary is generated based upon the words used in a specified number of previous most frequent search queries and comprises lists of transformations from plural to singular and singular to plural. Unnecessary transformations are removed from the dictionary based upon language modeling. The word to transform is determined by finding the last non-stop re-writable word of the query. The context of the transformed word is confirmed in the search documents and a version of the query is executed using both the original form of the word and the transformation of the word.

Type: Application

Filed: February 1, 2007

Publication date: August 7, 2008

Inventors: Fuchun Peng, Nawaaz Ahmed, Xin Li, Yumao Lu
Query rewriting with spell correction suggestions

Publication number: 20080147637

Abstract: Techniques for rewriting queries submitted to a query engine are provided. A query is submitted by a user and sent to a search mechanism. Based on the query, one or more query suggestions are generated. Features are generated based on the query and the query suggestions. Those features are input to a trained machine learning mechanism that generates a rewrite score. The rewrite score signifies a confidence score that indicates how confident the search mechanism is that the user intended to submit the original query. If the rewrite score is below a certain threshold, then the original query is rewritten to a second query. Results of executing the original query may be sent to the user along with a reference to the second query. Additionally or alternatively, results of executing the second query are sent to the user.

Type: Application

Filed: December 14, 2006

Publication date: June 19, 2008

Inventors: Xin Li, Nawaaz Ahmed, Fuchun Peng, Yumao Lu
Techniques for navigational query identification

Publication number: 20080059508

Abstract: To accurately classify a query as navigational, thousands of available features are explored, extracted from major commercial search engine results, user Web search click data, query log, and the whole Web's relational content. To obtain the most useful features for navigational query identification, a three level system is used which integrates feature generation, feature integration, and feature selection in a pipeline. Because feature selection plays a key role in classification methodologies, the best feature selection method is coupled with the best classification approach to achieve the best performance for identifying navigational queries. According to one embodiment, linear Support Vector Machine (SVM) is used to rank features and the top ranked features are fed into a Stochastic Gradient Boosting Tree (SGBT) classification method for identifying whether or not a particular query is a navigational query.

Type: Application

Filed: August 30, 2006

Publication date: March 6, 2008

Inventors: Yumao Lu, Fuchun Peng, Xin Li, Nawaaz Ahmed
Predicting results for input data based on a model generated from clusters

Publication number: 20070282591

Abstract: A method for predicting results for input data based on a model that is generated based on clusters of related characters, clusters of related segments, and training data. The method comprises receiving a data set that includes a plurality of words in a particular language. In the particular language, words are formed by characters. Clusters of related characters are formed from the data set. A model is generated based at least on the clusters of related characters and training data. The model may also be based on the clusters of related segments. The training data includes a plurality of entries, wherein each entry includes a character and a designated result for said character. A set of input data that includes characters that have not been associated with designated results is received. The model is applied to the input data to determine predicted results for characters within the input data.

Type: Application

Filed: June 1, 2006

Publication date: December 6, 2007

Inventor: Fuchun Peng

prev 1 2 3 4