SYSTEMS AND METHODS FOR REPRESENTING SEARCH QUERY REWRITES
Various embodiments include systems and methods for generating query rewrite records which may be used to generate standardized query rewrites for a search engine. Such records may identify rewrite triggers as well as constraints and other metadata flags which may be associated with certain rewrites in query rewrite identification (QRIL) records. In certain embodiments, such records may he analyzed with other QRIL records or rewrite information to prevent rewrite conflicts and to generate standardized rewrites. This information may then be used by a search engine to generate responses to user queries.
This application is a continuation of U.S. patent application Ser. No. 14/548,105, filed on Nov. 19, 2014, which is incorporated herein by reference in its entirety.
TECHNICAL FIELDThe present application relates generally to the technical field of electronic searching, and in particular to query rewrite systems and processes which may be used as part of electronic searching.
BACKGROUNDIn an online system providing search results based on user queries, often the objects being searched are evaluated under a variety of factors in order to produce search results that meet the user's needs as well as the needs of the online system. Query rewriting is one aspect of such a search engine. Query rewriting functions to adjust the terms used in a search to match the available search results, and in some systems query rewriting is primarily responsible for establishing the set of results that are retrieved in response to a user's search query. Systems and methods described herein relate to improved query rewriting.
Some embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings in which:
Example methods and systems for electronic searching are described, including example embodiments of query rewrite systems and processes which are used with electronic searching.
Query rewriting is an aspect of certain search engines. Query rewriting refers to a process of matching query terms received from a user with synonyms or other known information about query terms, and using that information to provide a set of search results that are superior to search results that would be provided by applying a standard search algorithm to the received query terms. As such, query rewriting may play a role in the processing of a user's search query and in the generation of a set of search results which is the set of results sent to a user in response to the user's query.
Certain embodiments described herein implement improved query rewriting using a query rewrite input language (QRIL) in combination with rewrite systems and methods to provide improved query rewriting. For example, a search engine may include an ad hoc set of rewrite instructions which are generated individually or in groups, but without systems and methods for considering the impact of new rewrite on the system. As additional rewrites are added to such an ad hoc system of rewrites, conflicts between different rewrites may be present without a system operator being aware of the conflicts. Such a conflict may exist, for example, when a search term or trigger is associated with two different rewrite values. This may produce unexpected and undesired set of search results in response to a user query depending of how the rewrite values are applied. Embodiments described herein may transcode individual rewrites into QRIL records which identify the characteristics of an individual rewrite. The QRIL record may then be processed by a QRIL processor along with all other QRIL records in a system to generate a set of standardized rewrites. When the QRIL record is processed, a standardized structural relationship is established with any overlapping or conflicting QRIL records and the associated rewrites. For example, two QRIL records with overlapping constraints that indicate that a query token should be rewritten in two different ways are resolved by the QRIL processor according to precedence rules. The precedence rules may be based on rewrite type, entry time, entry entity, or any other metadata or flag contained within the QRIL record. The standardized rewrites as generated by the QRIL processor may then be provided to a search engine for use in responding to user search queries. This may be the same search engine from which the ad hoc set of query rewrites was obtained, or this may be a different search engine.
Many embodiment search engine have tight service specifications which require a response be sent to a user search query within a short amount of time. Because of this, a query rewrite system according to certain example embodiments must provide a rewrite within fractions of a second or even fractions of a millisecond in some embodiments. Such service requirements do not allow for the search engine or rewrite system to make calls to a QRIL record database or standardized rewrite database due to the time associated with such calls. Instead, in certain example embodiments, standardized records are integrated in a search engine system to provide adequate query rewrite response time when a user query is received.
As described herein, a rewrite or query rewrite refers to a translation used by a search engine that changes or transforms all or part of a user query into another form. A query rewrite includes at least a trigger, which is a value or a set of values and logical operators to be transformed, and a rewrite value, which is the transformation value applied to the trigger. A query rewrite refers to the transform as it is used by the search engine and in the form in which it is used by the search engine. This may include the use of specific file formats, text configuration, and a streamlined set of elements that is different from the set of elements in an associated QRIL record. A query rewrite as used herein is therefore different from a query rewrite input language (QRIL) record, though query rewrites and QRIL record are discussed together in detail below. Standardized query rewrites are rewrites that have been created by a QRIL processor from QRIL records in order to eliminate conflicts and to apply a standardized set of rules to application of the rewrites described by the QRIL records.
A QRIL record as described herein refers to a domain specific data structure which describes a query rewrite along with other information about the query rewrite that enable a system to resolve conflicts between different query rewrites, as well as cure ambiguities about query rewrites that are not sufficiently defined in accordance with the expectations of a search engine. Systems and methods for generating and using QRIL records along with their associated query rewrites are described in detail below.
As described herein, a user query refers to information received by a search engine system from a client device that represents a user's search for information. A user query may, in various embodiments, take various forms. In one particular embodiment, a user query comprises a string of characters. The string may include multiple words, symbols, spaces, or numbers in any format.
A search engine as referred to herein is one or more devices configured to receive a user query, and search information available to the search engine to create a list of matches related to the information in the user query. Any number of different matching algorithms may be used by search engines in accordance with the embodiments described herein. Query rewriting as detailed herein particularly enables the matches generated by a search engine to be adjusted by a system operator. While similar adjustments may be made by a system operator that adjusts the matching algorithms used by the search engine, query rewriting enables a system operator to make such adjustments without risking the integrity of the matching algorithms. Where adjusting weights within a matching algorithm carries a significant risk of impacting matching results in unexpected ways, embodiments of standardized rewrites and search engines using such standardized rewrites described herein enable a user to influence the set of search results output by a search engine in defined and predictable ways using query rewrites that leave the matching algorithm intact and unchanged. Instead, standardized rewrites adjust the inputs to the matching algorithm in order to customize or just search engine operation as desired by a system operator or other system user with an ability to generate query rewrites. Similarly, in a large complex system involving data mining, third parties, ecommerce sales pages, search engines associated with large numbers of ecommerce sales pages and products, and additional system complexities, a standardized query rewrite system enables decoupling of elements of ecommerce searching from complex search engine systems. This also formalizes query rewrites in a way that enables different such parties to readily understand individual query rewrites, and further formalizes the interaction of a specific query rewrite with every other query rewrite in the system.
Additionally, while use of query rewrites maintains the integrity of search engine matching algorithms, unstructured query rewrites may conflict with each other. For example a first query rewrite may translate “Smartphone A” into “Product B.” A second query rewrite may translate “Smartphone A” into “Product Characteristic C.” A third query rewrite may translate every token instance of “smartphone” into “device A.” These query rewrites may interact in complex and unpredictable ways. This is especially true if the source of a first query rewrite is different than the source of the third query rewrite so that the creator of the first query rewrite creator is unaware of the other overlapping or conflicting query rewrites. Large search engine embodiments may include millions of query rewrites. A search engine for a large e-commerce system may, for example, include more than 25 million rewrites. Embodiments described herein provide for standardized precedence rules which determine how conflicts and interactions between different query rewrites operating in the same system are resolved.
Aspects of the embodiments described herein relate to classification of rewrite types. Certain embodiments may use different classifications of rewrite types. As discussed herein “direct” or basic rewrites are one type of rewrite, phrase rewrites are a type of rewrite, “token refinements” are a type of rewrite, and “whole query rewrites” are a type of rewrite. Other implementations may include other classifications of rewrites.
As referred to herein, direct rewrites involve a trigger directly associated with a rewrite value. While a direct rewrite may have additional associated aspects, including various constraints, categories, and metadata, the basic structure is the direct association between the trigger and the rewrite value. An example of a direct rewrite structure including additional information associated with an ecommerce search engine is:
-
- Constraints=[Ecommerce site where Query was issued, Trigger, Category Constraint, Query Origin Country]
- Rewrite=[Rewrite Value, Category Rewrite, Aspect Rewrite, Item Listing Siteid]
An example rewrite using the above structure is:
-
- Constraints=[Ecommerce Site where Query was issued=“USA”, Keyword Trigger=“fone”, Category=“electronics”, Query Origin Country=“Canada”]
- Rewrite=[Keyword Rewrite=“smartphone”, Category Rewrite=“123456”, Aspect Rewrite=“None”, Item Listing Siteid=“Canada”]
As used herein, a phrase rewrite involves rewriting a trigger to a phrase, where a phrase is defined as a sequence of contiguous word tokens. This is different from a direct rewrite in that a direct rewrite may have a rewrite value which is a single token, where the rewrite value of a phrase rewrite is a phrase involving multiple tokens. Additionally, while a direct rewrite may have a rewrite phrase with multiple tokens, the token order for the rewrite value of a direct rewrite is not specified. A phrase rewrite enables recalling of a more specific set of items than a corresponding direct rewrite. For example: a direct rewrite with trigger “built in camera” and rewrite value “built in rear camera”, will match more items than the a phrase rewrite with the same trigger “built in camera” and the rewrite value “built in PHRASE(rear camera).” In certain circumstances, the phrase rewrite is preferable since it will match a more precise set of items.
As used herein, a token refinement refers to a rewrite that involves adding or dropping keywords from a trigger. For example, if the trigger “cheap new princess smartphone cases” does not provide an acceptable set of search results, a system may use a token refinement rewrite to drop words (i.e. tokens) from the query. If the terms trigger is seen often enough in user queries, the system may gather information sufficient to determine that the tokens “cheap” and “new” are not key to the search result elements that a user is typically trying to retrieve using this search query. A token refinement deleting these terms may thus be used to rewrite “cheap new princess smartphone cases ” to “princess smartphone cases.” This is an example of token dropping. Conversely, token refinement may also be used to add words to a query. For example, a trigger “brandA” may have a token refinement rewrite value of “model # 123” which may the only popular product within an ecommerce search engine associated with brand A. A token refinement is a change to a trigger rather than a conventional rewrite that replaces a trigger value in a user query with a rewrite value. While certain token refinements may have the same functional effect as a direct rewrite in some circumstances, the creation of categories for direct rewrites and token refinement rewrites enables conflict resolution and certain types of QRIL record structures in various embodiments.
System 100 includes query rewrite sources 110, query transcoding device 120, QRIL record database 130, QRIL processor 140, production database 150, and search engine 160. The set of standardized rewrites 142 are also illustrated as an output of QRIL processor 140 that is communicated to search engine 160, production database 150, or both.
As shown by system 100, query rewrite sources 110 comprises a number of different rewrite sources. This may include any number of the example rewrite sources shown as well as other types of rewrite sources. Query rewrite sources 110 is illustrated as including query database 112, data mining module 114, rewrite optimization module 116, and editorial web service module 118.
Query database 112 comprises a local database of ad hoc query rewrites or a set of ad hoc query rewrites from a variety of networked database sources. For example query database 112 may include a set of query rewrites or a search engine that is different than search engine 160. This may include search engines which use a different query rewrite format and/or structure than that used by search engine 160. This information may be sent to query transcoding device 120 as a set of query rewrite data.
Data mining module 114 comprises a system that analyzes user queries, search results that are search engines response to those user queries, and the user selection following a user's receipt of the search results. Such a user selection may include selection of a link to a particular website, the user purchase of a product that was listed in the search results, or any other recorded user action taken following a user's receipt of the search results associated with the user query. Such data may additionally include information about different query rewrites that were used with different users that submitted the same initial search query. With a sufficiently large data set, statistical information and analysis may be generated for particular input queries, query rewrites, search results, and user responses. Data mining module 114 may analyze such information to generate sets of query rewrite data.
Rewrite optimization module 116 comprises a database of rewrites such as production database 150. For example, standardized rewrites 142 from production database 150 may be communicated to rewrite optimization module 116. Rewrite optimization module 116 may then analyze the set of standardized rewrites 142 to identify inefficiencies, redundant rewrites, or to generate new rewrites based on rewrites present as part of the set of standardized rewrites 142. The new rewrites identified by optimization module 116 or any redundant or inefficient rewrites identified by rewrite optimization module 116 may be communicated as a set of query rewrite data to query transcoding device 120.
Editorial web service module 118 comprises a service portal that enables third parties access to system 100 to generate customized QRIL records and associated standardized rewrites. For example, editorial web service module 118 may include a registration server that enables a merchant that sells products on an e-commerce portal associated with search engine 162 to submit sets of query rewrite data to query transcoding device 120. In such embodiments, the merchant may be associated with a particular constraint. For example, the merchant may have a storefront or portal as part of the e-commerce site associated with search engine 160. QRIL records generated from sets of query rewrite data provided by the merchant may automatically include a constraint that limits standardized rewrites generated from those QRIL records to the merchant's storefront. Additionally, because the QRIL processor 140 implements precedence rules, the system 100 limits the potential errors that may be introduced by sets of query rewrite data from third parties that are received via editorial web service module 118.
Query transcoding device 120 accepts sets of query rewrite data from query rewrite sources 110 and uses this information to generate QRIL records. Such QRIL records may be generated exclusively from information received from a single query rewrite source 110 or a QRIL record may be generated from query rewrite data received from multiple sources. In certain embodiments, history data stored by query transcoding device 120 may be used in conjunction with query rewrite data from Craig rewrite sources 110 to generate a QRIL record. Additional details related to query transcoding and query transcoding device 120 are discussed below with respect to query transcoding system 400 of
Once one or more QRIL records are generated by query transcoding device 120, the QRIL records are stored at QRIL record database 130. QRIL record database 130 may be a memory storage device that is integrated with query transcoding device 120, QRIL processor 140, or any other device. QRIL record database 130 stores sets of QRIL records which may be used to generate sets of standardized rewrites such as set of standardized rewrites 142. In certain embodiments, QRIL record database 130 may include separate sets of QRIL records. This may enable a single query transcoding device 120, QRIL record database 130, and QRIL processor 140 to provide sets of standardized rewrites that are distinct to different search engines.
When a set of standardized rewrites 142 is to be generated for search engine 160, QRIL processor 143 use QRIL records from QRIL record database 130. In certain embodiments, each QRIL record may be retrieved individually, or a set of QRIL records may be requested by QRIL processor 140 all at one time. QRIL processor 140 then analyze the set of QRIL records from QRIL record database 130 to generate the set of standardized rewrites 142. As part of this process, a rewrite type associated with each QRIL element may be identified, and other constraint and or meta-flag information may be processed to both generate a standardized rewrite and to resolve any conflicts between standardized rewrites defined by different QRIL elements. The set of standardized rewrites 142 is the output of QRIL processor 140 that results from QRIL processor 140 analyzing the QRIL records from QRIL record database 130. When the set of standardized rewrites 142 is complete, it may be output from QRIL processor 140 to production database 150. In various embodiments, production database 150 is optional. As described above, production database 150 may be used to verify the actual standardized rewrites which are active in search engine 160. Production database 150 may also be used by the rewrite optimization module 116 to further refine rewrites in later updated versions of the set of standardized rewrites 142. Production database 150 may also be used with a test search engine to verify the impact of certain QRIL records on standardized rewrites and the search results associated with user queries that are rewritten by the standardized rewrites. For example, editorial web service module 118 may provide a merchant access to a test search engine, which is not shown, as well as the rewrites of the set of standardized rewrites 142 related to the merchant in production database 150. Editorial web service module 118 may enable a merchant to provide a set of query rewrite data that will be processed by Corey transcoding device 120 and QRIL processor 144 a nonproduction set of standardized rewrites based on the merchants changes from the merchants set of query rewrite data. The merchant may then submit test queries to observe how these test query rewrites interact with previously existing standardized rewrites to generate a set of search results within the test search engine.
Search engine 160 may be any search engine which uses query rewrites such as the set of standardized rewrites 142. As mentioned above, particular e-commerce related search engines are detailed herein, particularly in search engine 800 of
System 100 describes one potential implementation of a system for generating QRIL records and associated standard rewrites, as well as using standard rewrites generated from QRIL records in a search engine. In various embodiments each of the elements of system 100 may be implemented as a module in a single device or multiple devices. Such elements may also be implemented as separate devices or as systems operating across multiple devices. As such, query transcoding device may be a module operating on the same device with QRIL processor 140. Alternatively, query transcoding device 120 may be a network system of computing devices which are further networks to one or more devices which make up QRIL processor 140.
Operation 205 is an optional registration step as described above with respect to editorial web service module 118. Such a registration may enable certain system users to generate QRIL records with constraint values that limit the rewrites associated with the QRIL records to searches particularly associated with the system user than generates the QRIL records. An example of such an association may be a merchant operating a virtual storefront with access to a broader publication system such as system 700. Such a QRIL record may include a constraint that limits the associated standardized rewrites to applying only to queries received from the merchant's virtual storefront. Operation 205 may occur when a third party such as a merchant, a search consultant, a system user, a middleware provider, or any other such third-party is provided access to system 100. Operation 205 is a registration with query transcoding device 120. In various other embodiments, an intermediate editorial web service module 118 may entirely handle the registration system, or additional security layers and user interface layers may be presented to handle registration, access, and other various account details. In other embodiments, query rewrite resources 110 and query transcoding device 120 may be communicatively coupled as part of a network or some other communication path, without the need for and associated registration process.
In operation 210, query rewrite data is received by query transcoding device 120. This may be in response to an operator selection or an automatic update of query rewrite data that is periodically provided to query transcoding device 120 as part of a system update. In embodiments where the query rewrite data is provided to query transcoding device 120 in response to an operator selection, the selection may be made by a user operating a machine such as third party server 730, client machine 710 or client machine 712 described in more detail below. As part of the operation of such devices, third party application 728, a web client 706 or programmatic client 708 may include a user interface with an input selection that enables the user to transmit query rewrite data two query transcoding device 120. Such applications or clients may communicate with query transcoding device 120 or an intermediate registration device or application to register with system 100 as part of the previous operation 205. Options for automatic communication of query rewrite data or user selected communication of query rewrite may be selected by the user as part of registration, or may be set automatically by predetermined system settings.
In operation 215, the query rewrite data that is received in operation 210 is analyzed to identify a trigger and associated rewrite value. As used herein, a trigger refers to characters, words, phrases, symbols, or any other sets of information which, when received as part of a user query, are used to initiate a rewrite to transform those sets of information into another form as part of a rewrite. For example, the word “smart phone” may be a trigger. And associated rewrite value may be “brand A phone.” If the query rewrite data is received from a query database 112 that included sets of query rewrites, the trigger and associated rewrite value may be explicitly identified in the query rewrite data. In this case, the character parser may be used to identify the trigger and the rewrite value from the query rewrite data. If the trigger and associated rewrite value are not explicitly identified by a character parser that analyzes the query rewrite data, additional analysis may be performed to identify a trigger in rewrite, or the query rewrite data may be flagged by query transcoding device 120 as data not containing a trigger or rewrite, and a QRIL record may not be created from this data. Additional details associated with trigger identification and rewrite identification are described below with respect to query transcoding device 400 of
In operation 220, a query rewrite type is assigned to the identified trigger and rewrite. The query rewrite type is used to determine priority or precedence levels that the standardized rewrite derived from the query rewrite data will received. The query rewrite type is determined by a structure of the rewrite, supporting data or metadata associated with the rewrite as part of the query rewrite data, or both. The rewrite above where “smart phone” is associated with the rewrite “brand A phone” is referred to herein as a direct rewrite. The structure of a direct rewrite comprises a trigger and a rewrite value. This is the simplest structure, where the rewrite involves replacing the trigger with the rewrite value. Additional examples of rewrite types include phrase rewrites, token refinement rewrites, and whole query rewrites. Additional details related to query rewrite types and specific example embodiments of different query rewrite types are discussed below with respect to
In operation 225 any other related constraint or metadata information present in the query rewrite data may be identified. Similar to the identification of the trigger in the rewrite value, this other related constraint or metadata information may be present in the data as sets characters, and a character parser may identify character groupings which are known to match certain constraints related to elements of a QRIL record. An example QRIL record including a number of different QRIL elements is illustrated by QRIL record 700 and the various components of QRIL record 700 illustrated by
Additionally, in operation 225, the QRIL record is generated from the trigger, rewrite value, query rewrite type, and related constraint or metadata information identified in operations 215 and 220. Such a record may be generated using a processor to create the record structure and to gather text, symbol, or other operator information from a parser used in operations 215 and 220. Additional details of systems that may be used for QRIL record generation are discussed with respect to
In operation 230, the system checks to see if additional QRIL record can be generated from the received query rewrite data, or if there is additional query rewrite data to be received and analyzed from one or more query rewrite sources 110. If additional query rewrite data is still to be analyzed, the process continues in operation 235 with processing additional rewrite data from one or more sources. This query rewrite data may be from a single query rewrite source 110, or multiple of the query rewrite sources 110. This may include any of the sources shown as part of query rewrite sources 110 including query database 112, data mining module 114, rewrite optimization module 116, or editorial web service module 118. Operations 215 through 230 are then repeated can tell no additional query rewrite data remains to be processed. In various embodiments, this processing of query rewrite data in operations 210 through 230 may be performed simultaneously using any number of processors, query transcoding devices 120, or other modules or devices that perform such operations. In other embodiments, this processing may be a set of operations performed periodically, or performed whatever a trigger identifying new query rewrite data is received. In certain embodiments, QRIL records may be generated in operations 210 through 230 and aggregated so that QRIL records generated at different times are all communicated to a QRIL record database together. In other embodiments each QRIL record is stored in a QRIL record database 130 as it is generated. In certain embodiments, a single query transcoding device 120 may sent QRIL records to multiple databases, and a target database may be determined by information identified from query record data, by an identity of a query rewrite source 110, or by information received as part of a registration in operation 205.
If no additional rewrite data is identified in operation 230, then all of the QRIL records are stored at QRIL record database 130 in operation 240. The QRIL records stored at QRIL record database 130 may be stored for later use such that there is a delay in time between operation 240 and operation 245, or updates and new QRIL records stored in QRIL database 130 may be immediately communicated to a QRIL processor for analysis.
In operation 245 the QRIL records are analyzed by one or more and QRIL processors. In certain embodiments, individual QRIL records may be analyzed serially by a single QRIL processor. In other embodiments, QRIL records may be analyzed in parallel by one or more QRIL processors such as QRIL processor 140. The QRIL processor analysis determines the format associated with a search engine, and the information from a QRIL record that is needed to generate a standardized rewrite in the format acceptable to the search engine. While method 200 describes one example implementation of QRIL processor analysis and standardized query generation, additional details and other aspects of QRIL processor operation which may be used in different embodiments are described below with respect to QRIL processor 540 of
As part of operation 245, the query type included as an element of the QRIL record may be identified by the QRIL processor 140 and various different processing operations may be implemented based on the query type of the QRIL record. Details associated with different query types are discussed below, and the characteristics of different query types may be used by the QRIL processor 140 during operation 245. Following an initial analysis of a QRIL record in operation 245, rewrite conflicts and precedence rules may be used to generate one or more rewrites in operations 250 through 290 as detailed below.
In addition to the different types of query rewrites discussed above, certain QRIL records and associated rewrites may involve recursive rewrites. Operation 250 checks a QRIL record for settings associated with recursive rewrites. The term recursive rewrites refer to chains of rewrites that may occur when a rewrite value associated with the first rewrite is a trigger associated with a second rewrite. For example if a first direct rewrite has a trigger “fone” and a rewrite value “smartphone” and a second direct rewrite has a trigger “smartphone” and a rewrite value “phone model #12345,” then a chain of rewrites may result in the token “fone” in a user's query being rewritten to “phone model #12345.” The check of operation 250 may involve a QRIL record element which indicates whether recursive rewrites are allowed or enabled for the rewrite associated with a QRIL record. Certain QRIL records may, in certain embodiments, include an element which specifically allows or specifically prohibits a rewrite value to be used as a trigger for further rewrites. In other embodiments, system rules may determine whether recursive rewrites are allowed. If recursive rewrites are allowed, the system may proceed to analyze any related QRIL records or previously generated standardized rewrites. For example if the rewrite with the trigger “fone” is part of a QRIL element which indicates that recursive rewrites are not allowed, then the second rewrite which is part of the set of standardized query rewrites is ignored during the generation of the standardized rewrite for this QRIL element. If however, recursive rewrites are allowed, then in operation 255 the QRIL processor 140 will check for rewrites that have a trigger which match all or part of the rewrite value for the QRIL element being processed. This may include checking all QRIL elements in QRIL record database 130. This may also involve checking all standardized rewrites from a current set of standardized rewrites 142. If applicable rewrites are found during operation 255, then the recursive rewrite is analyzed in a repeat of operation 245. The recursive rewrite is then checked for a double recursive rewrite in a repeat of operation 250. This process proceeds in a nested fashion until there are no further recursive rewrites, or until a system limit on recursive rewrites is reached. In certain embodiments, a single QRIL may have two nested rewrites from the same trigger. For example, if the first QRIL has a rewrite value of “Belgian double chocolate,” and applicable triggers exist for both “Belgian” and “double chocolate,” then if no other constraints prevent it, a nested rewrite for both “Belgian” and “double chocolate” may be analyzed, and their respective rewrite values used in the creation of the standardized rewrite.
In operation 260, the QRIL processor 140 may determine if any conflicts exist with the rewrite. Examples of conflicts include rewrites with the same trigger and different rewrite values. Operation 260 may involve QRIL processor 140 checking the rewrite for the current QRIL record against other QRIL records, against previously generated standardized rewrites that have already been incorporated into a set of standardized rewrites by the QRIL processor 140, or both.
If a conflict is identified, then in operation 265, the system analyzes the rewrites that are in conflict and applies presedence rules to resolve the conflict. Conflict resolution is required when the same trigger is associated with multiple different rewrites, and one rewrite conflicts with one or more of the rewrites. This may occur for example when a phrase rewrite and a direct rewrite have identical triggers with the same tokens. In such a circumstance, the phrase rewrite will typically match only a subset of the items that are matched by the corresponding direct rewrite. The system may resolve such conflicts with fixed rules. One embodiment provides that when a phrase rewrite and a direct rewrite include the same triggers, the direct rewrite is dropped, and the phrase rewrite is used by the system as providing the more succinct set of matches. Another embodiment assesses an expected set of results from two conflicting rewrites. The rewrite with the greater amount of rewrite detail which would be expected to return a narrower search results is selected. This may be assessed based on a number of characters or tokens in a rewrite value. This may also be assessed based on a metaflag value or other related information in a QRIL record. For example, a QRIL record may include a metaflag element for a precedence score or a detail value. Such a metaflag value may be used to resolve which QRIL record when a trigger is part of a search query, or an order in which a trigger is applied.
In one potential embodiment, certain conflicting query types are given precedence based on query type. In one potential embodiment, a whole query rewrite is given priority, as a whole query rewrite is an exact match to a user query string. The whole query rewrite includes a specific rewrite value with no derivative transformations or recursive rewrites, as the whole query rewrite is specifically tailored to an exact user query. The whole query rewrite thus is a priority rewrite, and any conflicting rewrites of a different type will not be executed in view of the precedence of the whole query rewrite. Because a whole query rewrite has a trigger which is an exact match to a user query, conflicting whole query rewrites may raise an error flag to be output to a system operator. In embodiments without such a conflict output error, the whole query conflicts may be resolved as described above, with the rewrite value containing the greatest amount of detail taking precedence.
Continuing with the example embodiment of conflict resolution discussed for whole query rewrites, in this embodiment, a token adjustment rewrite may then take precedence after a whole query rewrite, and a phrase rewrite may take precedence over a direct rewrite as described above. Any rewrite conflicts between rewrites of the same type may be resolved as described above in favor of the narrowest rewrite value. If derivative or recursive rewrites are allowed, such that rewrite values a rewritten query may act as a trigger for additional rewrites, then each level of recursion following a completed rewrite may use the same rules discussed above to resolve rewrite conflicts at each level of derivation.
When the conflict resolution is confirmed, all of the related conflicting rewrites are updated in the set of standardized rewrites as part of operation 275. In certain embodiments, this may involve removing one of the rewrites from the set of standardized rewrites. In other embodiments, this involves selecting the rewrite order, such that the first rewrite will be used, and after the trigger is transformed with the rewrite value, the other trigger will no longer apply. In one potential embodiment of a set of precedence rules, whole query rewrites have precedence over all other rewrites, direct rewrites have precedence over phrase rewrites and token refinements, and phrase rewrites have precedence over token refinements. Rewrites of the same type may be given priority based on the level of detail (e.g. a number of characters, tokens, or symbols) in the rewrite value, with a higher level of detail (e.g. more characters) having priority over a lower amount of detail. In certain embodiments, a QRIL may have a priority metaflag element that is used to resolve conflicts between rewrites of the same type.
In operation 270, after all elements of the QRIL record have been considered and any conflicts have been resolved, the standardized query rewrite is generated by QRIL processor 140. In operation 280, a set of standardized query rewrites is updated to include the new query rewrite. In operation 285, the QRIL processor 140 checks to see if any additional QRIL records are to be considered and used to generate standardized rewrites that will be used as part of the set of standardized query rewrites. The process performed by QRIL processor 140 then repeats operations 245 through 285 until all applicable QRIL records are considered.
When all QRIL records are finished being considered, a set of standardized rewrites 142 is output from QRIL processor 140. In various embodiments, this may be an output communication from a cache or local memory of QRIL processor 140. In other embodiments, this may be a final adjustment made by QRIL processor 140 to a text file stored in a separate memory, where the text file comprises the set of standardized rewrites 142. In operation 290, the set of standardized query rewrites is provided to search engine 160. In operation 295, the search engine operates using the set of standardized query rewrites to generate search results in response to queries received from client devices. The search engine proceeds until a system update occurs as part of operation 298. When a system update occurs, the process may repeat from operation 230, with generating new QRIL records, processing the QRIL records to update or generate a new set of standardized query rewrites, and to update the set of standardized query rewrites used by the search engine 160.
Method 300 begins with operation 305 receiving at a query transcoding device from a first query rewrite source device, a first set of query rewrite data. In the example embodiment of query transcoding device 400, this query rewrite data is received at input module 422. The set of query rewrite data may include any information related to products or searches, and includes constraint data, metaflag data, and any other related query rewrite data. The query rewrite data includes information that may be used to identify a first trigger value and an associated first query rewrite value which, together with the first trigger value, make up the core information that will become the rewrite. The constraint data which may be used to identify appropriate limitations on a related rewrite. The metaflag data includes any information or data relevant to a rewrite type other than the actual trigger and rewrite values. The metaflag data may also include data indicating whether recursive rewrites are allowed for a related rewrite, data that may assist in identifying a category which may be associated with the rewrite if a category constraint is not explicitly identified, or other categories other than an explicitly identified category that may be associated with a rewrite.
In operation 310, the query information is processed to identify the first trigger and the first query rewrite value. In the example embodiment of query transcoding device 400, this processing may be done using data parser module 424. Data parser module may be a text parser or other computational parser that analyzes the query data to build a data structure giving a representation of the query data. The data parser analyzes the characters or symbols in query data to identify a trigger and a rewrite value as the core part of a rewrite that will be the basis of a QRIL element. The data parser may also use a token or character library to identify matching tokens or strings of characters within the query data that are associated by the library with certain metadata, constraints, or other elements of a QRIL record.
Operation 315 then involves analyzing the first set of query data to identify a first query rewrite type associated with the first set of query rewrite data from a plurality of query rewrite types. In one embodiment, the data structure generated by data parser module 424 may be used in conjunction with a plurality of rewrite type identifier modules identify a query type associated with the data query. For example, direct rewrite identifier module 426, phrase rewrite identifier module 428, token refinement identifier module 430, and whole query refinement identifier 431 may each include library token or structure information about a rewrite type that is characterized by the rewrite system. As data parser module 422 analyzes query rewrite data, the modules may use the information from the query rewrite data as analyzed and structured by data parser module 424, to associate the query rewrite data with a query type. If no query type is identified by modules 426-431 using the data parser module 424, then the QRIL generation and formatting module 436 may determine that no QRIL record is to be generated from the query rewrite data.
In addition to the identification of the first trigger in the first query rewrite value in operation 310 and the identification of the first query rewrite type in operation 315, additional embodiments may analyze the query rewrite data for other information. This other information may include details used to create metaflags, details used to identify constraints that tell a system when a rewrite will or will not be used, or other such information. Additional details related to such metaflags are discussed below with respect to
Operation 320 then involves generating a first query rewrite input language (QRIL) record from the first set of query rewrite data. Operation 325 then involves storing the first QRIL record in a QRIL record database with a plurality of QRIL records. The first QRIL record comprises the first trigger value and the first query rewrite value. The QRIL record may be generated by QRIL generation and formatting module 436 using values identified or generated using any module of query transcoding device 400 described above. The QRIL record generated by QRIL generation and formatting module 436 may then be communicated to QRIL record database 130 by output module 438 as part of operation 325. In certain embodiments, QRIL records may include additional elements other than the core elements of the first trigger value and the first query rewrite value.
Similarly a search engine may have information about a country or other geographic location from which a user query originates and this may be used with country element 516 to constrain certain query rewrites to be used or not be used when a query originates from the location identified by country element 516 of a particular QRIL record such as QRIL record 500.
Site element 514 may identify a website, merchant storefront, or other e-commerce portal which may act as another constraint on a particular query rewrite. For example in one embodiment, system 700 may host a plurality of e-commerce marketplaces via the marketplace application's 720. Each marketplace associated with a marketplace application 720 may have a site identifier. That site identifier may be used as a value for site element 512 in QRIL record 500. This may enable an operator of a particular marketplace application 720 to create QRIL record 500 and use site element 514 to constrain QRIL record 500 to apply only to queries originating from the merchants marketplace application 72.0 as identified by the value of site element 514.
Meta-flags 520 of QRIL record 500 may include QRIL elements for any number of different types of information. In the QRIL record 500 of
In addition to the query rewrite type, metaflags 520 may also indicate other details to be associated with the rewrite of QRIL record 500. Exclude element 530 may be used to indicate that certain rewrite types are negative rather than positive. This means that the rewrite is done to exclude search results containing rewrite value rather than to search for results containing the rewrite value. A derived rewrite disabled element 532 may be used to identify whether recursive rewrites are allowed to use the rewrite value of rewrite value element 550 as a trigger for a subsequent rewrite. Category match 539 and phrase categories 538 may identify categories in the category tree of an e-commerce search engine to be used with a search performed with the rewrite value of rewrite value element 550. In other embodiments, any number of other elements may be used as part of a QRIL record such as QRIL record 500.
For users to access online resources, providers such as a provider of ecommerce websites often provide a search service to locate resources pertinent to the user's interest. A goal of the provider is to provide results that satisfy several concerns of both the user and the provider, such as relevant results that induce the user to use the provider again, revenue generation for the provider, and satisfying business partner (e.g., advertisers or sponsors) concerns. When the provider is an e-commerce provider, the considerations of, for example, generating revenue from the sales of item listings returned in search results or business partner concerns can be particularly important (e.g., given more weight) in ranking the results than simply the relevance of an item to the search. The provider may have a tremendous amount and variety of information, which can be used to rank results, such as information about the resources it provides, information on user behavior (e.g., how often users have chosen a given resource in response to a search), provider revenue information, or business partner information. Often the provider will use parts of this information to identify and present resources as results in response to a user search in order to meet the provider's goals. Results can be ranked using the information, wherein the ranking may provide the order in which the results appear to the user.
Traditionally, a provider may spend a great deal of time attempting to determine which pieces of information in its possession are relevant to find and present user search results in a way to meet its goals. The chosen pieces of information often must be assembled, used as inputs into a variety of functions, and weighted against each other. All of these actions typically involve manual intervention by the provider at every step (e.g., identifying the data to be used, developing the functions, and determining relative weights of the functions). Such weighting as part of a searching or matching algorithm to provide search results which matches a user query includes risks of error or corruption of the integrity of the matching. Manipulating matching weights may have unexpected results. By using query rewriting to transform part or all of a user query, a search engine may enable an optimization which prevents certain of such unexpected risks. Additionally, as described above, constraints may be used with query rewrites to enable optimization to be performed on a per user basis, a per storefront basis, a per geographic location, or other targeted basis.
Such a system may use one or more matching algorithms that can be used to match a user query with a database items, and can be used to rank user search results, with the top results returned to the user's client device as a set of search results.
In one example embodiment, a system 600 may be an e-commerce search engine associated with a publishing platform such as system 700. The platform of system 700 may include storefronts for a large number of merchants and sales platforms for the merchants. System 700 may also include an auction platform, a payment system for auctions and merchant storefronts, and other e-commerce services. As part of all of these e-commerce services together, system 700 may comprise a category tree which is used to categorize products available for sale or auction via system 700. Such a category tree may include a top level identifying the category tree, broad categories in a second level under the top level such as an electronics category, a sports equipment category, an automobile category, or any other such category. Each of these categories may be used as a constraint in a QRIL record as described above. Each second-level category may include one or more third level categories which are associated in the tree with one or more second-level categories. For example the electronics category may have third level categories of televisions, computers, smart phones, tablet devices, and other such categories structured under the second level electronics category in the category tree. Each bottom level category or any category in the category tree may have associated keywords, metadata, or other such information relevant to products available for sale via system 700 which are categorized by the category tree.
Further, a rewrite may not only have a trigger and a rewrite value, but a category rewrite. A category rewrite may limit a search to a particular category in a category tree. QRIL record 500, for example, includes category rewrite 552. Category rewrite may be a rewrite that, instead of replacing a trigger token with a rewrite value, limits a search based on a user query to a particular category of a category tree. For example, a QRIL record 500 may include a trigger “brandA televisions” where the rewrite is a token adjustment rewrite to delete the token “television” and to add the category rewrite “electronics/televisions.” Thus, when a user query including “brandA televisions” is received, the rewrite associated with this QRIL that is part of a search engine's set of standardized rewrites will rewrite “brandA televisions” to a query like “brandA:category=electronics/televisions.” The search engine will then search for the term “brandA” but only within the category “televisions” under “electronics” in the category tree.
When a user query is received by front in 606, the user query may be sent to QFM 604 for query rewriting and data factor generation. Query rewriting in QFM 604 may use a set of standardized query rewrites as described above. Additionally, data factor generation may identify categories that are associated with the user query. For example, history data associated with a user that submits user query 602 may be used to distinguish between ambiguous terms such as “Apple.” Such categorization may be associated with user query 602 and used as constraining information based on any constraints associated with query rewrites as part of a set of standardized Paris rewrites. Additionally, as described for example in QRIL record 500, query rewrites generated from QRIL records may include category values which are associated with the category tree described above. In such embodiments, an additional query rewrite type may include fuzzy category rewrites. A fuzzy category rewrite refers to the use of keywords, products, or other terms within a category tree that are associated with the user query or tokens any user query by the data factor generation of QFM 604. QFM 604 may thus include one or more modules for categorization of user query 602 which is then used for dynamic query rewriting based on information within the modules. This may include time sensitive information associated with merchant sales, holiday sales, user history associations with particular merchants, or any other such information which may be used in data factor generation is a dynamic input to a fuzzy category query rewrite.
In embodiments where fuzzy category query rewrites are used, the system will include conflict rules for this type of rewrite in addition to all other types of rewrites available to the system. In one embodiment, for example, fuzzy category query rewrites are in a lowest priority and are only used if no other rewrites are present for user query 602. In certain embodiments, conflicts between multiple fuzzy query rewrites will typically not occur because a fuzzy query rewrite will be a single rewrite generated by a fuzzy rewrite system. This single rewrite is generated by a fuzzy rewrite system based on a category analysis or some other analysis system where rewrites are based on groups of category associations rather than a defined transform from a token a user query to a rewrite value. Instead a fuzzy query rewrite will be based on preference information or system settings within a query fuzzy rewrite module as part of data factor generation in QFM 604.
A fuzzy query rewrite may result in problems when descendent or recursive query rewrites are allowed with fuzzy query rewrites. For example, a fuzzy query rewrite may rewrite the key word of user query 602 to a category of category tree. Thus, instead of a search for tokens of the user query, a search will be performed directed to keywords, products, or other information associated with the category of the category tree. Such a rewrite to a category may further allow query rewrites based on information within the category. As an example, the user query “brandA men's shoes” may be rewritten by a fuzzy query rewrite to “brandA” and an associated category search restriction on “clothing, shoes and accessories/men's shoes/athletic.” If the system also includes a direct rewrite with the trigger “brandA. shoes” with a rewrite value of “brandA” and a category search restriction on category “clothing, shoes and accessories/men's shoes,” then the second conflicting search will potentially include a much broader set of results than the first rewrite. As described above, such a conflict may be resolved by either prioritizing a rewrite type, or by prioritizing a rewrite that will result in a narrower set of search results.
The query may be rewritten using a set of standardized rewrites received such as the set of standardized rewrites 142 generated by QRIL processor 140 from a set of QRIL records in QRIL record database 130.
The query node 612 can apply one or more ranking goal models 610 to the query profile. Such a ranking goal model 610 may identify the type of match that qualities as a search result for a particular query or query profile. In an example, the goal models can also be used to select search results from database 614. The database 614 can return a search index of the item listings returned as a result of the query 602.
Item index 616 can include the raw returned item data to the query node 612 where the list of item listings 618 is unranked (e.g., unordered). The set of ranking data factors 620 can include all of the data factors for a given item listing and query 602 to be used by the set of ranking goal models 610. The factors can be inputted into the ranking goal models 610 to produce a ranked result set 622 that can then be presented to the user. In an example, a higher ranked item listing can be displayed more prominently than a lower ranked item listing (e.g., the higher ranked item listing can appear higher in the list of search results presented to the user than the lower ranked item listing. In an example, prominently displaying the higher ranked listings can include using color (e.g., varying background or foreground color), animation, or additional visual decorations (e.g., borders, titles, etc.)
In example 600, the search query is may be for items being sold in an online publishing system or marketplace, but other examples where a user queries a data resource and the results are ranked and returned are also contemplated. The various components of system 600 can be executed in software, hardware, or some combination thereof. In the case of software components, it will be understood that the hardware necessary to execute the software will also be present.
An Application Program Interface (API) server 714 and a web server 716 are coupled to, and provide programmatic and web interfaces respectively to, one or more application servers 718. The application servers 718 host one or more marketplace applications 720, payment applications 722, and search engine 723.The application servers 718 are, in turn, shown to be coupled to one or more databases servers 724 that facilitate access to one or more databases 726.
The marketplace applications 720 may provide a number of marketplace functions and services to users that access the networked system 702. The payment applications 722 may likewise provide a number of payment services and functions to users. The payment applications 722 may allow users to accumulate value (e.g., in a commercial currency, such as the U.S. dollar, or a proprietary currency, such as “points”) in accounts, and then later to redeem the accumulated value for products (e.g., goods or services) that are made available via the marketplace applications 720. While the marketplace and payment applications 720 and 722 are shown in
Further, while the system 700 shown in
The web client 706 accesses the various marketplace and payment applications 720 and 722 via the web interface supported by the web server 716. Similarly, the programmatic client 708 accesses the various services and functions provided by the marketplace and payment applications 720 and 722 via the programmatic interface provided by the API server 714. The programmatic client 708 may, for example, be a seller application (e.g., the TurboLister application developed by eBay Inc., of San Jose, Calif.) to enable sellers to author and manage listings on the networked system 702 in an off-line manner, and to perform hatch-mode communications between the programmatic client 708 and the networked system 702.
The example computer system 800 includes a processor 802 (e.g., a central processing unit (CPU) a graphics processing unit (GPU) or both), a main memory 804 and a static memory 806, which communicate with each other via a bus 808. The computer system 800 may further include a video display unit 810 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)). The computer system 800 also includes an alphanumeric input device 812 (e.g., a keyboard), a cursor control device 814 (e.g., a mouse), a disk drive unit 816, a signal generation device 818 (e.g., a speaker) and a network interface device 820.
The disk drive unit 816 includes a machine-readable medium 822 on which is stored one or more sets of instructions (e.g., software 824) embodying any one or more of the methodologies or functions described herein. The software 824 may also reside, completely or at least partially, within the main memory 804 and/or within the processor 802 during execution thereof by the computer system 800, the main memory 804 and the processor 802 also constituting machine-readable media.
The software 824 may further be transmitted or received over a network 826 via the network interface device 820.
While the machine-readable medium 822 is shown in an example embodiment to be a single medium, the term “non-transitory machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “machine-readable medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present invention. The term “machine-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, and optical and magnetic media.
Thus, a method and system for search result ranking using machine learning have been described. Although the present invention has been described with reference to specific example embodiments, it will be evident that various modifications and changes can be made to these embodiments without departing from the broader spirit and scope of the invention. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
While in the foregoing specification certain embodiments of the invention have been described, and many details have been set forth for purposes of illustration, it will be apparent to those skilled in the art that the inventive subject matter is susceptible to additional embodiments and that certain of the details described herein can be varied considerably without departing from the basic principles of the invention.
The Abstract is provided to comply with 37 C.F.R. Section 1.72 (b) requiring an abstract that will allow the reader to ascertain the nature and gist of the technical disclosure. It is submitted with the understanding that it will not be used to limit or interpret the scope or meaning of the claims. The following claims are hereby incorporated into the detailed description, with each claim standing on its own as a separate embodiment.
Claims
1. A system comprising:
- a query transcoding device comprising:
- an input module that receives, from a first query rewrite source device, a first set of query rewrite data, wherein the first set of query rewrite data comprises constraint data, metaflag data, and rewrite data, wherein the constraint data comprises at least a first trigger value, and wherein the rewrite data identifies at least a first query rewrite value associated with the first trigger value;
- a data parser module coupled to the input module that processes the first set of query data to identify the first trigger and the first query rewrite value, and that communicates parsed query data to one or more identifier module to identify a first query rewrite type associated with the first set of query rewrite data from a plurality of query rewrite types; and
- a query rewrite input language (QRIL) record generation and formatting module that generates a first QRIL record from the first set of query rewrite data, wherein the first QRIL record comprises the first trigger value, the first query rewrite value, and a first metaflag element that identifies the first QRIL record as associated with the first query rewrite type.
Type: Application
Filed: Aug 7, 2017
Publication Date: Nov 23, 2017
Inventors: Prathyusha Senthil Kumar (San Jose, CA), Praveen Arasada (San Jose, CA), Ravi Chandra Jammalamadaka (Santa Clara, CA)
Application Number: 15/670,426