METHOD OF COMPUTING A COOPERATIVE ANSWER TO A ZERO-RESULT QUERY THROUGH A HIGH LATENCY API

An information retrieval system including a user operated computer having a display and a storage facility, a search back-end containing an index into a collection of documents, the index defining results sets of queries relative to the collection of documents, and a search application stored in the storage facility which is executable by the computer to perform a method of providing a cooperative answer to a query submitted by the user, the submitted query having an empty result set, the cooperative answer comprising subqueries of the submitted query and, for each subquery, cardinality information indicative of whether the subquery has an empty result set. The method includes making an initial set of requests to the search back-end, a request specifying a query and calling for a response that provides cardinality information indicative of whether the subquery has an empty result set, the response being said to be positive if the result set is non-empty and negative if the result set is empty, the requests in the initial set of requests being made in parallel and specifying queries that are subqueries of the submitted query. When a positive response to a request is received, the method includes displaying the query specified by the request together with cardinality information provided by the response which enables the user to determine that the result set of the specified query is not empty, the specified query and the cardinality information being part of the cooperative answer. When a negative response to a request is received, the method includes making a subsequent set of requests to the search back-end, the requests being made in parallel and specifying queries that are subqueries of the query specified by the request for which the negative response is received.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description
CROSS-REFERENCE TO RELATED APPLICATIONS

The subject matter of this application is related to the subject matter of U.S. Provisional Patent Applications No. 61/106,570, filed on Oct. 18, 2008, and No. 61/201,855, filed on Dec. 16, 2008, priority to which is claimed under 35 U.S.C. §119(e) and which are incorporated herein by reference.

The subject matter of this application is also related to the subject matter of a companion Non-Provisional patent application entitled “Facilitating Browsing of Result Sets”, by F. Corella and K. P. Lewison, filed simultaneously with this application and incorporated herein by reference.

BACKGROUND

In information retrieval, when a query submitted by a user fails (i.e. the query has an empty result set), it is beneficial to provide to the user a so-called cooperative response, or cooperative answer, to the query that provides additional information other than the fact that there are no results. Many kinds of cooperative answers, applicable to many kinds of information retrieval systems, have been proposed in the academic literature. For example, one approach proposed one type of cooperative answer to conjunctive queries, and another, broader type of cooperative answer to Boolean queries, applicable in the context of a bibliographic information retrieval system.

Today, all Web search engines accept conjunctive queries, and some accept Boolean queries, such that the cooperative answers proposed in the above-referenced approach are applicable in the context of Web search. A cooperative answer according to this approach consists of a collection of subqueries of the failed query submitted by the user, a subquery being a query obtained by removing syntactic elements from the submitted query, the subqueries in the collection being more general than the submitted query. The collection of subqueries comprises subqueries that fail like the submitted query (even though they are more general and thus less likely to fail than the submitted query), and subqueries that succeed (i.e. that have a non-empty result set). The subqueries that succeed are useful as possible follow-up queries. The subqueries that fail are useful because they avoid the repeated failures that would occur if the user submitted those subqueries, and because each of them is, in some sense, an explanation of the failure.

Recently, an important computational architecture has emerged for Web search engines. In this architecture, the search engine is split into two independent components, a front-end that runs on a client machine operated by a user and a back-end that runs on one or more server machines, the front-end being connected to the back-end via the Internet and accessing the back-end through an HTTP-based network interface called a Web application programming interface (Web API). The front-end provides a user interface, while the back-end provides a search index. In operation, the user submits a query through the user interface on the client machine, and the client machine forwards the query to the back-end which computes a result set using the index. The back-end provides pages of the result set of the query to the front-end, and the front-end displays those pages to the user.

Many search back-ends are available on the Web today, including back-ends provided by the companies that own the most popular search engines, Google, Yahoo, and Bing, those back-ends using the same indices that are used in those popular search engines. Many hundreds of search front-ends have been developed, resulting in the many hundreds of search engines that are available on the Web today.

This split search engine architecture has important advantages, but also has a disadvantage when it comes to providing cooperative answers. A Web API, because it is accessed via the Internet, has a high latency, usually of a few hundred milliseconds. Such a high latency makes it impractical to compute a cooperative answer by any of the methods that have been proposed so far, all of which assume that the cooperative answer is computed on the same machine where result sets are computed. A cooperative answer consists of many subqueries, and its computation requires interrogating the back-end through the API about those subqueries and many more subqueries that do not end up in the cooperative answer. Computing a cooperative answer by traditional methods can easily take 30 seconds or more, which is unacceptable to the user in a modern computing environment.

SUMMARY

In one embodiment, an information retrieval system is provided which includes a user operated computer having a display and a storage facility, a search back-end containing an index into a collection of documents, the index defining results sets of queries relative to the collection of documents, and a search application stored in the storage facility which is executable by the computer to perform a method of providing a cooperative answer to a query submitted by the user, the submitted query having an empty result set, the cooperative answer comprising subqueries of the submitted query and, for each subquery, cardinality information indicative of whether the subquery has an empty result set. The method includes making an initial set of requests to the search back-end, a request specifying a query and calling for a response that provides cardinality information indicative of whether the subquery has an empty result set, the response being said to be positive if the result set is non-empty and negative if the result set is empty, the requests in the initial set of requests being made in parallel and specifying queries that are subqueries of the submitted query. When a positive response to a request is received, the method includes displaying the query specified by the request together with cardinality information provided by the response which enables the user to determine that the result set of the specified query is not empty, the specified query and the cardinality information being part of the cooperative answer. When a negative response to a request is received, the method includes making a subsequent set of requests to the search back-end, the requests being made in parallel and specifying queries that are subqueries of the query specified by the request for which the negative response is received.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings are included to provide a further understanding of embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and together with the description serve to explain principles of embodiments. Other embodiments and many of the intended advantages of embodiments will be readily appreciated as they become better understood by reference to the following detailed description. The elements of the drawings are not necessarily to scale relative to each other. Reference numerals consist of a concatenation of a one- or two-digit number referring to a figure, followed by a two-digit number that locates the referenced part within the figure. A reference numeral pertaining introduced in a figure may be used in other figures to refer to the same part or a similar part.

FIG. 1 is a block diagram generally illustrating an example of an information retrieval system for providing cooperative answers to zero-result queries through a high-latency API according to embodiments described herein.

FIG. 2 is a block diagram illustrating user-interface elements presented on a display according to one embodiment.

FIG. 3 is an example of a cooperative answer displayed in a left panel according to one embodiment.

FIG. 4 is an example of a cooperative answer to a query that uses a site restriction according to one embodiment.

FIG. 5 is an example of a cooperative answer comprised of queries with associated checkboxes according to one embodiment.

FIG. 6 is an illustration of the concept of a conjunction graph of a conjunctive query.

FIG. 7 is an illustration of the concept of a conjunction graph of a conjunctive query in which conjunction operators are omitted.

FIG. 8 is an illustration of the concept of a conjunction graph of a Boolean query in negation normal form.

FIG. 9 is a flow diagram illustrating a process followed by a search application to initiate a computation of a cooperative answer to a conjunctive query.

FIG. 10 is a flow diagram illustrating a process followed by a search application, while computing a cooperative answer to a conjunctive query, when a response is received from a back-end.

FIG. 11 is a flow diagram illustrating a process followed by a search application to initiate a computation of a cooperative answer to a Boolean query in negation normal form.

FIG. 12 is a flow diagram illustrating a process followed by a search application while computing a cooperative answer to a Boolean query in negation normal form when a response is received from a back-end.

DETAILED DESCRIPTION

In the following Detailed Description, reference is made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration specific embodiments in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural or logical changes may be made without departing from the scope of the present invention. The following detailed description, therefore, is not to be taken in a limiting sense. It is to be understood that the features of the various exemplary embodiments described herein may be combined with each other, unless specifically noted otherwise.

FIG. 1 is a block diagram generally illustrating an example of an information retrieval system 100 for providing cooperative answers, or cooperative responses to zero-result queries through a high-latency API, using a method of computing cooperative answers, or cooperative responses, according to one embodiment, that makes it practical to provide such answers in spite of the high latency of the API.

The word “response” has several meanings in the context of a search. It can, for example, refer to a page of results presented to the user after the user submits a query, to a page of results and other information returned from a search back-end in response to a request made by a front-end, to a cooperative response that consists of a set of subqueries. To avoid confusion, the term “cooperative answer” is hereafter used in lieu of the term “cooperative response”

System 100 includes a computer 110 operated by a user 120, a network 130, and a search back-end 140 that hosts an index 150 into a collection of documents. Computer 110 is equipped with a display 112, a keyboard 114, and a mouse 116, and includes a storage facility 160. An executable program, called a search application 162, is stored in storage facility 160 and runs on computer 110.

Given a query, index 150 defines the result set of the query, each result in the result set being a data item providing information about a document that matches the query, including a link to the document. In one embodiment the collection of documents is a collection of files reachable on the World Wide Web. In one embodiment, the collection of documents comprises a collection of documents on an intranet.

Search back-end 140 is connected to the network 130 and exposes a network interface 142 that computer 110 uses to communicate with search back-end 140. Network interface 142 is a logical specification of messages that can be sent to and received from search back-end 140. For illustrative purposes, network interface 142 is shown in FIG. 1 as what appears to be a physical component.

The latency of network interface 142, relative to computer 110, is the time it takes for computer 110 to receive a response to a request made through network interface 142, including any delay due to network 130. In one embodiment, where the network 130 is the Internet and the collection of documents indexed by the index 150 of search back-end 140 is a substantial portion of the set of files reachable on the World Wide Web, the latency is typically a few hundred milliseconds, such latency being considered a high latency. Methods according to the present embodiments are especially beneficial in such high latency architectures.

In one embodiment, network interface 142 exposed by search back-end 140 uses Hypertext Transfer Protocol (HTTP) messages, such an interface being called an HTTP-based interface. In one such embodiment, network 130 is the Internet; the HTTP-based interface is then called a Web interface, or a Web Application Programming Interface (Web API). In one embodiment, network 30 is an intranet.

In one embodiment, where network 130 is the Internet, storage facility 160 contains a Web browser program 168 that runs on computer 110, and search application 162 is downloaded by a Web browser 168 from a Web site 170 connected to the Internet. In one embodiment, search application 162 runs on computer 110 by being interpreted by a browser extension (also known as a browser plug-in) 180.

In one embodiment, the browser plug-in 180 is supplied by Adobe Systems, Inc. and called Flash Player®, and search application 162 is built on a programming platform also supplied by Adobe Systems, Inc. and called Flex®. In such an embodiment, search application 162 comprises project code written in MXML and ActionScript and compiled by a platform compiler, and platform code supplied by Adobe Systems, Inc., some of which is linked with the project code. The project code is an event-driven program, comprising routines that are invoked when events take place. Such a routine is called an “event listener” or “event handler”, and is said to “handle” or “react to” the event upon which it is invoked.

Search application 162 uses display 112, keyboard 114, and mouse 116 of computer 110 to interact with user 120. Keyboard 114 is used to enter text into input fields of display 112. Mouse 116 is used for performing user-interface procedures, such as clicking on a button.

Search application 162 uses network interface 142 to fetch data from search back-end 140, including data such as:

(1) an estimate of the number of results in the result set of a query, called a “cardinality estimate” for the query; and

(2) a page of the result set of a query, pages being numbered, each page containing a fixed number of consecutive results (e.g. page 1 containing results 1 through 10, page 2 containing results 11 through 20, etc.).

In one embodiment, search application 162 fetches such information by making a request to search back-end 140 via network 130. The request conforms to network interface 142 and specifies the query, as well as a desired range of results from the result set of the query. The request elicits a response that fulfills the request. The response also conforms to the network interface. The response contains the cardinality estimate and, if the result set is not empty, the page of results. The cardinality estimate is accurate as to whether the result set is empty or not (i.e. if the estimate is zero the result set is empty and if the estimate is not zero the result set is not empty). A response with a zero estimate is said to be negative response, while a response with a non-zero estimate is said to be a positive response.

FIG. 2 illustrates an example of display 112 being used by search application 162 to present user-interface elements to user 120, according to one embodiment. The user-interface elements include an input field 210, called a “query box” or “search box”, where user 120 can enter a query, a search button 220 that user 120 can click to submit the query, and three panels, a left panel 230, a center panel 240 and a right panel 250. In one embodiment, these user-interface elements are presented inside a window 260 opened by Web browser 168 on display 112. In one embodiment, right panel 250 is used to display advertisements.

In one embodiment, left panel 230 contains a set of queries whose result sets can be browsed simultaneously in center panel 240, as disclosed in the above-referenced companion patent application. When user 120 submits a query, the submitted query and additional queries are placed in left panel 230. If the submitted query has an empty result set and satisfies conditions described in the above-referenced companion application (i.e. does not consist of a single search term and is not overly complex), the additional queries placed in left panel 230 are subqueries of the submitted query that constitute a cooperative answer to the query.

FIG. 3 shows an example of a cooperative answer displayed in left panel 230, according to one embodiment. Left panel 230 contains a query 310 submitted by user 120 that has an empty result set, viz. the query “smoothie asdf qwer xyz13924”. Left panel 230 also contains five subqueries of the query 310 that together constitute a cooperative answer to the query 310. The subqueries have fewer search terms than the query. They are therefore more general than the query, and have result sets that are supersets of the result set of the query.

The cooperative answer lists three subqueries that have non-empty result sets, subqueries 320, 330, and 340, followed by two subqueries that have empty result sets, subqueries 350 and 360. Queries that have a non-empty result set are said to succeed and are called “succeeding subqueries”. Queries that have empty result sets are said to fail and are called “failing queries” or “zero-result queries”.

Succeeding subqueries are included in the cooperative answer because they are useful as possible follow-up queries. A succeeding subquery is included in the cooperative answer only if there is no succeeding subquery that is more specific (i.e. that has more search terms). For example, the succeeding subquery “smoothie” is not included because the succeeding subquery 320 “smoothie asdf” and the subquery 330 “smoothie qwer” are more specific succeeding subqueries.

Failing subqueries are also useful because they help user 120 avoid repeated failures. For example, after the failure of the query 310 “smoothie asdf qwer xyz13924”, user 120 may try “smoothie asdf qwer”, which also fails. This is avoided by the letting user 120 know that this query, included in the response as subquery 350, also fails even though it is more general than the submitted query 310. A failing subquery is included in the cooperative answer only if there is no failing subquery that is more general (i.e. that has fewer search terms). For example, the failing subquery “smoothie xyz13924” is not included because the failing subquery 360 “xyz13924” is more general.

Another reason why failing subqueries are useful is that they can be viewed as explanations of the failure of the submitted query. For example, the query 310 can be viewed as failing for two independent reasons: because there are no Web pages containing the words “smoothie”, “asdf” and “query”, and because there are no Web pages containing the word “xyz13924”.

A contrived example was used in FIG. 3 because few queries that target the Web at large and consist of a combination of a few dictionary words have empty result sets. Empty result sets, however, occur more frequently when queries target a subset of the Web, such as a particular Web site. In one embodiment, a query can be made to target a site by using a site restriction, which is a search term consisting of the operator “site:” immediately followed by the DNS domain of the site.

FIG. 4 shows an example of a cooperative answer to a query that uses a site restriction, according to one embodiment. The query 410 “smoothie banana durian site:myrecipes.com” has an empty result set. The subqueries 420, 430, and 440 of the query 410 constitute a cooperative answer to the query 410. The subquery 420 informs user 120 that results for “smoothie banana query” can be found outside the targeted site, while the subquery 430 informs user 120 that results for “smoothie banana” can be found in the site; both are possible follow-up queries. The subquery 440 informs user 120 that the word “durian” does not appear at all in the targeted site.

In one embodiment, a query can be made to target a particular site without using a site restriction, by specifying the DNS domain of the site in a separate input box.

FIG. 5 shows the same query 410 and the same cooperative answer comprising subqueries 420, 430, 440, according to one embodiment, where every query in left panel 230 has an associated checkbox, as described in the above-referenced companion patent application. FIG. 5 shows the state of panel 230 immediately after user 120 has submitted the query 410. The queries 410, 420, 430, and 440 will be deleted from left panel 230 when user 120 submits a subsequent query, unless user 120 removes a checkmark from a checkbox in order to retain the associated query. Queries 510 and 520 illustrate unrelated queries that were added earlier to left panel 230 and have been retained.

In one embodiment, a query is a Boolean expression consisting of the Boolean operators AND, OR, and NOT, and of search terms, with spaces used to separate syntactic elements and parentheses used for grouping, such a query being called a Boolean query. A search term can be a word or a phrase surrounded by double quotes. In one embodiment, a search term can be a colon expression, which begins with an operator consisting of a keyword followed by a colon. An example of a colon expression is the site restriction used above in the example of FIG. 4, which uses the operator “site:” consisting of the keyword “site” followed by a colon.

A Boolean query is said to be a conjunction, or a conjunctive query, if its operator with broadest scope, called the top-level operator, is the conjunction operator AND. It is said to be a disjunction, or a disjunctive query, if its top-level operator is the disjunction operator OR. It is said to be a negation if its top-level operator is the negation operator NOT. Since AND and OR are associative operators, parentheses can be omitted without ambiguity around an inner conjunction that is an operand of an outer conjunction, and around an inner disjunction that is an operand of an outer disjunction.

The conjuncts of a conjunctive query can be defined as follows: remove all unnecessary parentheses, then consider the sequence of top-level AND operators as an n-ary conjunction; the conjuncts are then the operands of the n-ary conjunction. Consider for example the expression “asdf AND (qwer AND (xyz OR (abc AND def)))”. It has an unnecessary pair of parentheses. After removing the unnecessary parentheses it becomes “asdf AND qwer AND (xyz OR (abc AND def))”. This can be viewed as a ternary conjunction with operands “asdf”, “qwer” and “xyz OR (abc AND def)”. Those three operands are the conjuncts of the conjunction.

The disjuncts of a disjunctive query are defined in the same way as the conjuncts of a conjunctive query, mutatis mutandis.

Given a conjunctive query, a “conjunctive subquery” of the query is obtained by removing one or more conjuncts from the query, an “immediate conjunctive subquery” being obtained by removing exactly one conjunct. For example, “asdf AND poiu” is an immediate conjunctive subquery of “asdf AND qwer AND poiu”, obtained by removing the middle conjunct “qwer”; and “poiu” is a conjunctive subquery obtained by removing the first two conjuncts.

It should be observed that, whereas in the phrase “conjunctive query” the word “conjunctive” refers to the contents of the query, in the phrase “conjunctive subquery” the word “conjunctive” refers to how the subquery is derived from a query that contains it. Thus, a conjunctive subquery is not necessarily a conjunctive query. For example, “poiu” is a conjunctive subquery of “asdf AND qwer AND poiu” but it is not a conjunctive query (i.e. it is not a conjunction), since it consists of a single search term.

A conjunctive query defines a graph whose nodes are the query and its conjunctive subqueries. There is an edge between a first node and a second node if the second node is an immediate conjunctive subquery of the first node; the first node is then said to be a parent of the second node, and the second node a child of the first node.

As an example, FIG. 6 shows the conjunction graph 600 of the query “asdf AND qwer AND (xyz OR (abc AND def))”. The top node 610 is the query itself. Nodes 620, 630, and 640 are the immediate conjunctive subqueries of the query, each obtained by removing one of the conjuncts of the query. Nodes 650, 660, and 670 are conjunctive subqueries of the query obtained by removing two conjuncts. Nodes 650, 660, and 670 are also the conjuncts of the conjunctive query. The lines in the Figure represent the edges of the graph. There is an edge 680, for example, between node 640 and node 660, because the query 660 “(xyz OR (abc AND def)” is an immediate conjunctive subquery of the query 640 “qwer AND (xyz OR (abc AND def))”. Node 660 is a child of node 640, and node 640 is a parent of node 660.

In one embodiment, the conjunction operator can be omitted, so that juxtaposition is interpreted as conjunction. For example, the query “asdf qwer poiu” is a conjunctive query with implicit conjunction operators, logically equivalent to the query “asdf AND qwer AND poiu” where the conjunction operators are explicit. In one embodiment, no explicit Boolean operators are recognized by the parser, and therefore every query consists either of one search term, or of a sequence of search terms interpreted as a conjunction. FIG. 7 shows the conjunction graph 700 of the conjunctive query “asdf qwer poiu” with implicit conjunction operators.

A Boolean expression, and in particular a Boolean query, is said to be in negation normal form if it contains no negated conjunctions or disjunctions (i.e. if the negation operator NOT only occurs in the query applied to atomic Boolean expressions). Every Boolean expression can be put in negation normal form using De Morgan's laws.

The concept of a “conjunction graph” can be extended to apply to a query in negation normal form, as follows. If the query is a single search term or a negation, the graph has a single node, which is the query itself. If the query is a conjunction, the conjunction graph is as defined before. If the query is a disjunction, the nodes of the graph are the disjuncts of the disjunction, as well as the conjunctive subqueries of any disjuncts that are conjunctions. It should be noted that a node can be a conjunctive subquery of multiple disjuncts. The edges of the graph and the concepts of “parent” and “child” are defined as before. A root is a node that has no parents. If the query is a search term, a negation, or a conjunction, it has a single root; but if it is a disjunction, each disjunct is a root.

The conjunction graph of a disjunction is a subgraph of the conjunction graph of the conjunction obtained by combining the conjuncts of all the disjuncts, which will be referred to as the extended conjunction graph of the query.

As an example, FIG. 8 shows the conjunction graph and the extended conjunction graph of the Boolean query:

    • (asdf AND qwer AND (xyz OR (abc AND def))) OR
    • (asdf AND (NOT poiu) AND (xyz OR (abc AND def))),
      which is in negation normal form. Dashed rounded boxes indicate nodes that are part of the extended graph but not of the non-extended graph. The query has two disjuncts, which are the roots 810 and 820 of the non-extended graph. Nodes such as node 830 which are not part of the non-extended conjunction graph are deemed not to be of interest to user 120 and are excluded from a cooperative answer to the query.

In one embodiment, when search application 162 provides a cooperative answer to a Boolean query having an empty result set, the cooperative answer consists of the most general nodes in the conjunction graph of the negation normal form of the query that have an empty result set, and the most specific nodes in the same graph that have a non-empty result set. As a special case, if the query is a conjunctive query, the cooperative answer consists of the most general conjunctive subqueries of the query that have an empty result set, and the most specific ones that have a non-empty result set.

FIGS. 9 and 10 illustrate two processes used by search application 162 to perform a method of providing a cooperative answer to a conjunctive query submitted by user 120 that has an empty result set. To compute the cooperative answer, search application 162 makes requests to search back-end 140 through the network interface 142. Each request specifies a query that is a conjunctive subquery of the submitted query, and produces a response that includes a cardinality estimate for the specified query.

Search application 162 makes sets of requests in parallel. A set of requests is made in parallel, by definition, if each request is made without waiting for responses to previous requests to be received. By making use of parallel requests, the present implementation is able to reduce the delay that would be caused by multiple serial interactions with the network interface. In one embodiment, where the latency of network interface 142 with respect to the computer 110, including the latency due to network 130, is a few hundred milliseconds, experiments show that the present implementation is able to provide a cooperative answer in only a few seconds for queries that necessitate interactions with network interface 142 that would take 30 seconds or more if done serially.

Search application 162 uses a first counter to keep track of the number of requests that have been made since the beginning of the computation, and a second counter to keep track of the number of responses that have been received. The first counter is called the “counter of requests made” and the second counter is called the “counter of responses received. The value of the first counter minus the value of the second counter is equal to the number of pending requests. The computation ends when the number of pending requests is equal to zero. The two counters are kept in the storage facility 160.

Search application 162 constructs in storage facility 160 a list of zero-result queries that it uses to collect failing conjunctive subqueries of the submitted query during the computation. At the end of the computation, the list of zero-result queries contains the most general failing conjunctive subqueries of the submitted query.

In the descriptions of FIGS. 9 and 10, the meaning of the terms “parent” and “child” is relative to the conjunction graph of the submitted query.

FIG. 9 is a flow diagram generally illustrating one embodiment of a process 900 employed by search application 162 to initiate the computation of a cooperative answer to a conjunctive query submitted by user 120. At 910, search application 162 creates an empty list of zero-result queries and proceeds to 920.

At 920, search application 162 initializes the counter of responses received, setting the value of the counter to zero, and then proceeds to 930. At 930, search application 162 makes an initial set of requests to search back-end 140, the initial set of requests consisting of requests specifying queries that are children of the submitted query, one such request for each child, and initializes the counter of requests made to the number of such requests.

FIG. 10 is a flow diagram generally illustrating one embodiment of a process 1000 employed by search application 162 when a response is received, the response fulfilling a request. At 1010, search application 162 increments the counter of responses received and proceeds to 1020. At 1020, search application 162 checks if the cardinality estimate contained in the response is zero. If so, search application 162 proceeds to 1030. Otherwise, search application 162 proceeds to 1050.

At 1030, search application 162 removes from the list of zero-result queries any query subsumed by the query specified by the fulfilled request. A query in the list is subsumed by the specified query if the specified query is a conjunctive subquery of the query in the list, and therefore is more general than the query in the list. Search application 162 then proceeds to 1040.

At 1040, search application 162 makes a subsequent set of requests and increments accordingly the counter of requests made, the requests specifying queries that are children of the query specified by the fulfilled request, the set consisting of one such request for each child whose parents have all failed. Search application 162 then proceeds to 1060.

In one embodiment, search application 162 tests whether the parents of a node have all failed by keeping a count of the number of times that a request specifying the node has been fulfilled and checking whether the count is equal to the number of parents of the node. In another embodiment, search application 162 tests such condition by remembering what nodes have failed.

At 1050, search application 162 displays the query specified by the fulfilled request, together with the cardinality estimate contained in the response, in the left panel 230, and then proceeds to 1060. At 1060, search application 162 checks whether the number of requests made equals the number of responses received according to the respective counters. If so, search application 162 proceeds to 1070. Otherwise, the process 1000 terminates. At 1070, search application 162 displays in the left panel 230 the queries contained in the list of zero-result queries, each with a cardinality estimate of zero.

FIGS. 11 and 12 are flow diagrams generally illustrating two processes used by search application 162 to perform a method of providing a cooperative answer to a Boolean query submitted by user 120 that has an empty result set. The method is an extension of the method of providing a cooperative answer to a conjunctive query illustrated by FIGS. 9 and 10. As before, search application 162 makes sets of requests to search back-end 140 in parallel, uses a counter of requests made and a counter of responses received, and builds a list of zero-result queries.

FIG. 11 is a flow diagram generally illustrating one embodiment of a process 1100 employed by search application 162 to initiate the computation of a cooperative response to the Boolean query submitted by user 120. At 1110, search application 162 puts the Boolean query in negation normal form and proceeds to 1120.

At 1120, search application 162 creates an empty list of zero-result queries, and then proceeds to 1130. At 1130, search application 162 initializes the counter of responses received, setting the value of the counter to zero, and then proceeds to 1140. At 1140, search application 162 makes an initial set of requests to search back-end 140, consisting of requests specifying queries that are children of the roots of the conjunction graph of the negation normal form of the submitted query, one such request for each child, and initializes the counter of requests made to the number of such requests.

FIG. 12 is a flow diagram generally illustrating one embodiment of a process 1200 employed by search application 162 when a response is received, the response fulfilling a request. The description of the process is not repeated here because as it is substantially similar to the description of corresponding portions of process 1000, except at 1240, which differs from 1040.

At 1240, search application 162 makes a subsequent set of requests and increments accordingly the counter of requests made, the requests specifying queries that are children of the query specified by the fulfilled request, the set consisting of one such request for each child whose parents in the conjunction graph of the negation normal form have all failed.

In one embodiment, search application 162 tests whether the parents of a node have all failed by remembering what nodes have failed. It is noted that when enumerating the parents to check if they have all failed, care must be taken to avoid the error of including parents of the node in the extended graph that are not parents of the node in the non-extended graph.

In summary, the embodiments described enable the computation of cooperative answers through any high-latency API in a much shorter period of time than is possible with traditional methods. One important case of a high-latency API is a Web API. Embodiments that utilize a Web API make it practical to provide cooperative answers in a split search engine architecture. However, in a addition to Web APIs, the teachings of the present embodiments may be applied to anytype of high-latency API, such as network interfaces based on protocols other than HTTP (e.g. LDAP, and interfaces that are not necessarily accessed across a network, such as database interfaces). Furthermore, although time savings provided by the present embodiments increase with the API latency, time savings can be realized for any latency and, thus, the benefits of the present embodiments can be realized in embodiments that utilize any API.

According to one embodiment, the teachings of the browsing techniques described by the companion application cross-referenced above so that result sets of the queries that comprise a cooperative answer can be browsed simultaneously, or in an interleaved fashion.

Although specific embodiments have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that a variety of alternate and/or equivalent implementations may be substituted for the specific embodiments shown and described without departing from the scope of the present invention. This application is intended to cover any adaptations or variations of the specific embodiments discussed herein.

Claims

1. An information retrieval system comprising:

a computer operated by a user, the computer having a display and a storage facility;
a search back-end containing an index into a collection of documents, the index defining results sets of queries relative to the collection of documents; and
a search application stored in the storage facility which is executable by the computer to perform a method of providing a cooperative answer to a query submitted by the user, the submitted query having an empty result set, the cooperative answer comprising subqueries of the submitted query and, for each subquery, cardinality information indicative of whether the subquery has an empty result set, the method comprising: making an initial set of requests to the search back-end, a request specifying a query and calling for a response that provides cardinality information indicative of whether the subquery has an empty result set, the response being said to be positive if the result set is non-empty and negative if the result set is empty, the requests in the initial set of requests being made in parallel and specifying queries that are subqueries of the submitted query; when a positive response to a request is received, displaying the query specified by the request together with cardinality information provided by the response which enables the user to determine that the result set of the specified query is not empty, the specified query and the cardinality information being part of the cooperative answer; and when a negative response to a request is received, making a subsequent set of requests to the search back-end, the requests being made in parallel and specifying queries that are subqueries of the query specified by the request for which the negative response is received.

2. The system of claim 1, wherein the computer is connected to the search back-end via a network and the search application uses a network interface exposed by the search back-end to make requests to the search back-end and obtain responses.

3. The system of claim 2 wherein the network interface comprises an HTTP-based interface.

4. The system of claim 3, wherein the network is the Internet.

5. The system of claim 4, wherein the collection of documents comprises a collection of files reachable on the World Wide Web.

6. A method of providing a cooperative answer to a query submitted by a user to an information retrieval system, the submitted query having an empty result set, the cooperative answer comprising subqueries of the submitted query and, for each subquery, cardinality information indicative of whether the subquery has an empty result set, the information retrieval system comprising a computer operated by the user, the computer having a display, the method comprising:

making an initial set of requests to a search back-end, each request specifying a query and calling for a response that provides cardinality information indicative of whether the subquery has an empty result set, the response being said to be positive if the result set is non-empty, and the response being said to be negative and the query said to have failed and to be a failing query if the result set is empty, the requests in the initial set of requests being made in parallel and specifying queries that are subqueries of the submitted query;
when a positive response to a request is received, displaying the query specified by the request together with cardinality information provided by the response which enables the user to determine that the result set of the specified query is not empty, the specified query and the cardinality information being part of the cooperative answer; and
when a negative response to a request is received, making a subsequent set of requests to the search back-end, the requests being made in parallel and specifying queries that are subqueries of the query specified by the request for which the negative response is received.

7. The method of claim 6, wherein the submitted query is a conjunctive query and the queries specified by the initial set of requests are children of the submitted query in a conjunction graph of the submitted query.

8. The method of claim 7, wherein the queries specified by the subsequent set of requests made when the response is a negative response, which is a response to a request that specifies a failing query, are those nodes of a conjunction graph of the submitted query that are children of the failing query and whose parents have all failed.

9. The method of claim 8, further comprising:

creating an empty list of zero-result queries before making the initial set of requests;
when a negative response is received, the negative response being a response to a request that specifies a failing query, removing from the list of zero-result queries every query subsumed by the failing query, then adding the failing query to the list; and
when there are no pending requests and all requests in initial and subsequent sets of requests having been fulfilled, displaying the zero-result queries contained in the list together with cardinality information that enables the user to determine that every such displayed query has an empty result set, the queries thus displayed and the cardinality information being part of the cooperative answer.

10. The method of claim 9, wherein, when a positive response to a request is received, the cardinality information provided by the response and displayed together with the query specified by the request is an estimate of the number of results in the result set of the query.

11. A method of providing a cooperative answer to a Boolean query submitted by a user to an information retrieval system, the submitted query having an empty result set, the cooperative answer comprising subqueries of a negation normal form of the submitted query and, for each subquery, cardinality information indicative of whether the subquery has an empty result set, the system comprising a computer operated by the user, the computer having a display, the method comprising:

making an initial set of requests to a search back-end, a request specifying a query and calling for a response that provides cardinality information indicative of whether the subquery has an empty result set, the response being said to be positive if the result set is non-empty and negative if the result set is empty, the requests in the initial set of requests being made in parallel and specifying queries that are subqueries of a negation normal form of the submitted query;
when a positive response to a request is received, displaying the query specified by the request together with cardinality information provided by the response which enables the user to determine that the result set of the specified query is not empty, the specified query and the cardinality information being part of the cooperative answer; and
when a negative response to a request is received, making a subsequent set of requests to the search back-end, the requests being made in parallel and specifying queries that are subqueries of the query specified by the request for which the negative response is received.

12. The method of claim 11, wherein the specified queries of the initial set of requests are those nodes of a conjunction graph of a negation normal form of the submitted query that are children of nodes that are roots of the graph.

13. The method of claim 12, wherein the specified queries of the subsequent set of requests made when the response is a negative response, which is a response to a request that specifies a failing query, are those nodes of a conjunction graph of a negation normal form of the submitted query that are children of the failing query and whose parents have all failed.

14. The method of claim 13, further comprising:

creating an empty list of zero-result queries before making the initial set of requests;
when a negative response is received, the negative response being a response to a request that specifies a failing query, removing from the list of zero-result queries every query subsumed by the failing query, then adding the failing query to the list; and
when there are no pending requests and all requests in initial and subsequent sets of requests having been fulfilled, displaying the zero-result queries contained in the list together with cardinality information that enables the user to determine that every such displayed query has an empty result set, the queries thus displayed and the cardinality information being part of the cooperative answer.

15. The method of claim 14, wherein, when a positive response to a request is received, the cardinality information provided by the response and displayed together with the query specified by the request is an estimate of the number of results in the result set of the query.

16. A computer readable storage medium storing computer executable instructions for controlling a computing device to perform a method of providing a cooperative answer to a query submitted by a user via a computing device to a search back-end containing an index into a collection of documents and defining result sets of queries relative to the collection of documents, the method comprising:

making an initial set of requests to the search back-end, a request specifying a query and calling for a response that provides cardinality information indicative of whether the subquery has an empty result set, the response being said to be positive if the result set is non-empty and negative if the result set is empty, the requests in the initial set of requests being made in parallel and specifying queries that are subqueries of the submitted query;
when a positive response to a request is received, displaying the query specified by the request together with cardinality information provided by the response which enables the user to determine that the result set of the specified query is not empty, the specified query and the cardinality information being part of the cooperative answer; and
when a negative response to a request is received, making a subsequent set of requests to the search back-end, the requests being made in parallel and specifying queries that are subqueries of the query specified by the request for which the negative response is received.

17. The computer readable storage medium of claim 16, wherein the computing device is connected to the search back-end via a network and the search application uses a network interface exposed by the search back-end to make requests to the search back-end and obtain responses.

18. The computer readable storage medium of claim 17, wherein the network interface comprises an HTTP-based interface.

19. The computer readable storage medium of claim 18, wherein the network is the Internet.

20. The computer readable storage medium of claim 19, wherein the collection of documents comprises a collection of files reachable on the World Wide Web.

Patent History
Publication number: 20100100563
Type: Application
Filed: Oct 19, 2009
Publication Date: Apr 22, 2010
Inventors: Francisco Corella (San Diego, CA), Karen Pomian Lewison (San Diego, CA)
Application Number: 12/581,859
Classifications
Current U.S. Class: Distributed Search And Retrieval (707/770); Query Processing For The Retrieval Of Structured Data (epo) (707/E17.014)
International Classification: G06F 17/30 (20060101);