UNIFIED RANKING WITH ENTROPY-WEIGHTED INFORMATION FOR PHRASE-BASED SEMANTIC AUTO-COMPLETION

Methods, systems, and computer-readable media related to a technique for combining two or more aspects of predictive information for auto-completion of user input, in particular, user commands directed to an intelligent digital assistant. Specifically, predictive information based on (1) usage frequency, (2) usage recency, and (3) semantic information encapsulated in an ontology (e.g., a network of domains) implemented by the digital assistant, are integrated in a balanced and sensible way within a unified framework, such that a consistent ranking of all completion candidates across all domains may be achieved. Auto-completions are selected and presented based on the unified ranking of all completion candidates.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description
CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority from U.S. Provisional Ser. No. 61/832,749, filed on Jun. 7, 2013, entitled UNIFIED RANKING WITH ENTROPY-WEIGHTED INFORMATION FOR PHRASE-BASED SEMANTIC AUTO-COMPLETION, which is hereby incorporated by reference in its entirety for all purposes.

FIELD OF INVENTION

This specification relates to predictive suggestion and/or auto-completion of user input, and more specifically, to predictive suggestions and/or auto-completions of user commands directed to an intelligent digital assistant.

BACKGROUND

Auto-completion of user queries improves the efficiency and ease by which users interact with a database or a search service. Word-level auto-completion allows a user to type the first few characters of an intended input word, and select the intended input word from a list of most likely completions of the inputted characters. Phrase-level auto-completion allows a user to provide the first few characters or the first one or more words of an intended input phrase, and select the intended input phrase from a list of most likely completions of the inputted characters or word(s).

Many techniques, statistical or otherwise, have been employed to improve the accuracy (e.g., as measured by user selection) of the list of auto-completions provided to the user. However, different state of the art techniques are biased toward different aspects of predictive information, and are not conducive to being combined with other techniques. As a result, it is challenging to provide an integrated method that effectively combines multiple aspects of predictive information to generate suitable completion candidates for the user.

SUMMARY

This specification describes a technique that uses at least two or more aspects of predictive information for auto-completion of user input, in particular, user commands directed to an intelligent digital assistant. Specifically, predictive information based on (1) usage frequency, (2) usage recency, and (3) semantic information encapsulated in an ontology (e.g., a network of domains) implemented by the digital assistant, are integrated in a balanced and effective way within a unified framework, such that a consistent ranking of all completion candidates across all domains may be achieved.

In one aspect, a method of providing cross-domain semantic ranking of complete input phrases for a digital assistant includes receiving a training corpus comprising a collection of complete input phrases that span a plurality of semantically distinct domains; for each of a plurality of distinct words present in the collection of complete input phrases, calculating a respective word indexing power across the plurality of domains based on a respective normalized entropy for said word, wherein the respective normalized entropy is based on a total number of domains in which said word appears and how representative said word is for each of the plurality of domains; for each complete input phrase in the collection of complete input phrases, calculating a respective phrase indexing power across the plurality of domains based on an aggregation of the respective word indexing powers of all constituent words of said complete input phrase; obtaining respective domain-specific usage frequencies of the complete input phrases in the training corpus; and generating a cross-domain ranking of the collection of complete input phrases based at least on the respective phrase indexing powers of the complete input phrases and the respective domain-specific usage frequencies of the complete input phrases.

In some embodiments, the method further includes: providing the cross-domain ranking of the collection of complete input phrases to a user device, wherein the user device presents one or more auto-completion candidates in response to an initial user input in accordance with at least the cross-domain ranking of the collection of complete input phrases.

In some embodiments, calculating the respective word indexing power across the plurality of domains for each word wi of the plurality of distinct words further includes: calculating the respective normalized entropy εi for the word wi based on a respective formula

ɛ i = - 1 log K k = 1 K c i , k t i log c i , k t i ,

wherein K is a total number of domains in the plurality of domains, ci,k is a total number of times wi occurs in a domain dk of the plurality of domains, and tikci,k is a total number of times wi occurs in the collection of complete input phrases, and wherein the respective word indexing power of the word wi is (1−εi).

In some embodiments, calculating the respective phrase indexing power across the plurality of domains for each complete input phrase Pj of the collection of complete input phrases further includes: distinguishing template words from normal words in the complete input phrase Pj, a template word being a word that is used to represent a respective category of normal words in a particular complete input phrase and that is substituted by one or more normal words when provided as an input to the digital assistant by a user; calculating the respective phrase indexing power, μj for the complete input phrase Pj based on a respective formula

μ j = b T n T ( j ) + 1 n N ( j ) + n T ( j ) [ i = 1 n N ( j ) ( 1 - ɛ i ) + i = 1 n T ( j ) ( 1 - ɛ i ) ] ,

wherein n|N(j) is a total number of normal words present in the complete input phrase Pj, nT(j) is a total number of template words present in the complete input phrase Pj, (1−εi) is the respective word indexing power of each word wi, and bT is a respective template bias multiplier used to calculate the weight bias bTnT(i) for the input phrase Pj.

In some embodiments, the respective template bias multiplier is a positive real number, and the method further includes adjusting the respective template bias multiplier based on a performance evaluation of the cross-domain ranking in providing auto-completion candidates for user input.

In some embodiments, receiving a training corpus further includes: collecting a plurality of user input phrases from a usage log of the digital assistant; identifying at least one template input phrase based on a common phrase pattern present in two or more of the plurality of user input phrases; and normalizing the plurality of user input phrase by substituting at least one word in each of said two or more user input phrases with a respective template word representing a generalization of the at least one word.

In some embodiments, generating the cross-domain ranking of the collection of complete input phrases further includes: calculating a respective integrated ranking score Rj for each complete input phrase Pj of the collection of complete input phrases based on a respective formula

R j = ω v v j + ω μ μ j ω v + ω μ ,

wherein vj is a respective frequency of the complete input phrase Pj that has been normalized for cross-domain comparison, μj is a respective phrase indexing power of the complete input phrase Pj across the plurality of domains, ωr and ωn, are relative weights given to domain-specific frequency and cross-domain indexing power in the ranking of the collection of complete input phrases; and generating the cross-domain ranking of the collection of complete input phrases based on the respective integrated ranking scores for the collection of complete input phrases.

In some embodiments, the method further includes: for each complete input phrase Pj, normalizing the respective domain-specific frequency of said complete input phrase Pj by a maximum phrase count of a single input phrase observed in the training corpus.

In some embodiments, the method further includes: updating the respective rank of each complete input phrase Pj based on a user-specific recency bias bR, wherein the user-specific recency bias is based on a number of times that a particular user has used the complete input phrase Pj.

In some embodiments, the method further includes: receiving the initial user input from a user; identifying, from the collection of complete input phrases, a subset of complete input phrases that each begins with the initial user input; ranking the subset of complete input phrases in accordance with the cross-domain ranking of the collection of complete input phrases; selecting a predetermined number of unique relevant domains based on respective domains associated with each of the subset of complete input phrases; and selecting at least one top-ranked input phrase from each of the unique relevant domains as one of the auto-completion candidates to be presented to the user.

In some embodiments, presenting one or more auto-completion candidates further includes: displaying a first portion of a first auto-completion candidate of the one or more auto-completion candidates, wherein the first portion of the first auto-completion candidate precedes or ends at a respective template word in the first auto-completion candidate; receiving a subsequent user input specifying one or more normal words corresponding to the respective template word; and in response to receiving the subsequent user input, displaying a second portion of the first auto-completion candidate succeeding the respective template word in the first auto-completion candidate.

In some embodiments, the method further includes displaying one or more suggestions corresponding to the respective template word based on a user-specific vocabulary comprising at least one of a plurality of proper nouns associated with the user, wherein the subsequent user input is a selection of the one or more displayed suggestions.

One or more embodiments of the techniques described herein may offer the following advantages. For example, by creating a unified ranking of complete input phrases that combines multiple sources of predictive information (e.g., semantic, frequency, and recency information) in a balanced and effective manner, the selection and ranking of candidate completions for a given user input is more consistent and accurate, and more likely to result in user adoption. The auto-completion candidates provided to the user help the user input data with minimal effort and better speed and efficiency. In addition, the auto-completion candidates also provide an opportunity to guide the user in generating well-formed commands and requests to the digital assistant, resulting in better performance of the digital assistant and a better user experience.

The details of one or more implementations of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an environment in which an exemplary auto-completion system operates in accordance with some embodiments.

FIG. 2 is a block diagram of an exemplary auto-completion system in accordance with some embodiments.

FIG. 3 is a block diagram of an exemplary system that provides entropy-weighted phrase-level auto-completions in accordance with some embodiments.

FIGS. 4A-4C are flow charts of an exemplary process for providing auto-completions for a user input in accordance with some embodiments.

Like reference numerals refer to corresponding parts throughout the drawings.

DETAILED DESCRIPTION

Word-level auto-completion is now routinely performed on computing devices, such as computers, smart phones and other mobile devices. Users have become accustomed to typing the first few characters (e.g., “s-e-a”) of an intended input word (e.g., “search”) and selecting the intended input word from a list of most likely completion candidates (e.g., “search” and “seafood”) that have been presented by the device in response to the entry of the first few characters. The selected auto-completion candidate is then used as the complete user input for the computing device, e.g., as a search query. Conventionally, the completion candidates are identified from a dictionary and ordered in decreasing order of historic usage frequency in a particular software application or applications.

Cross-word prediction has also become common, especially in cloud-based applications, where large n-gram language models can be leveraged to compute the probability that a particular word follows a given input pattern (e.g., the first few characters or words inputted by the user). Phrase level auto-completion is also possible. For example, given the first one or more words (e.g., “eye”) inputted by a user, one or more complete input having multiple words (i.e., a phrase or sentence, such as “Eye of the Tiger” or “eye exam”) can be identified and provided to the user for selection.

Auto-completion has also found applications in completing or suggesting user commands directed to an intelligent digital assistant. When one or more characters or words (e.g., “Find”) of an acceptable user command (e.g., “Find a restaurant nearby.”) is provided to the intelligent digital assistant, the intelligent digital assistant provides a list of most likely completions and/or suggestions (e.g., “Find a restaurant nearby,” “Find a downtown parking garage,” etc.) for the user input to the user for selection. In response to the user selecting one of the completions or suggestions, the digital assistant performs one or more tasks (e.g., performs a restaurant or parking garage search) for the user in accordance with the selected completion or suggestion.

Many techniques have been employed to improve the accuracy (e.g., as measured by user selection) of the auto-completion candidates provided to the user. However, different state of the art techniques are biased toward different aspects of predictive information, and are not conducive to being combined with other techniques. There are three aspects of predictive information that can be used: (1) usage frequency, 2) usage recency, and (3) semantic information encapsulated in an ontology (e.g., as a network of domains) implemented by the digital assistant.

Usage frequency information is aggregated information on how frequently a particular input phrase or input pattern (e.g., “Find me a restaurant nearby.”) has occurred over a period of time in the input history of a particular user or a multitude of users. A higher usage frequency of a particular input phrase predicts that the input phrase has a higher likelihood of being selected by the user when presented as an auto-completion candidate for the current user input. Frequency-based auto-completion does not scale well to phrases of arbitrary lengths (e.g., as in a natural language input), because long input phrases tend to be rare in the usage data log. The lack of sufficient usage data for longer input phrases and the increased demand on data storage imposed by the longer input phrases tend to diminish the utility of frequency-based identification and ranking of auto-completion candidates.

Usage recency information is aggregated information on how frequently a particular input phrase or input pattern has occurred in the recent input history of a particular user. The recent input history is optionally defined as the input history during a fixed time window (e.g., a week or three months) going back from the present time. A more frequent usage of a particular input phrase within a user's recent input history predicts that the input phrase has a higher likelihood of being selected by the user when presented as an auto-completion candidate for a current user input. User-specific recency information is frequently used in conjunction with aggregated usage frequency information in identifying and ranking the completion candidates presented to the user. However, recency-based identification and ranking of completion candidates can lead to a self-induced bias, such as excessive emphasis on certain candidate patterns at the detriment of others. For example, completion candidates that were less frequently presented or given lower ranks in the initial period of time after activation of the auto-completion functionality will have little chance of recovering from the initial disadvantage or improving its likelihood of being presented to and selected by the user in the future.

Semantic information is information encapsulated in an ontology (e.g., a network of domains) implemented by the digital assistant. Each domain (e.g., a restaurant domain) includes one or more tasks (e.g., a restaurant search task and a restaurant reservation task) that can be executed by the digital assistant. The digital assistant that is capable of processing natural language input can execute a task in response to a wide variety of expressions of the same or similar user intent. Different acceptable expressions of a command for a task may involve different keywords and vocabulary associated with the task in the domain. For example, the user may use different input expressions, such as “Find a restaurant nearby,” “I am looking for a place to eat,” “Where can I get lunch around here?”, to request the digital assistant to conduct a restaurant search. Each domain or task is associated with one or more input phrases through the semantic connections between words in the input phrases and words in the vocabulary of the domain or task. Semantic information is useful in identifying completion candidates in a relevant domain. However, it is difficult to uniformly and fairly compare the identified completion candidates across different domains, because different domains vary greatly in terms of their overall usage frequencies and complexity (e.g., as reflected by the complexity of acceptable commands, and the number of alternative expressions for a command, etc.). Conventionally, completion candidates identified based on semantic information are treated equally, and ranked arbitrarily in the list of completion candidates presented to the user, which can lead to inconsistent results and user confusion.

It is desirable to unify the different sources of predictive information within a single framework, such that a desired balance between frequency, recency, and semantic coverage information can be attained. Thoughtless and arbitrary ways of combining the frequency, recency, and semantic information are unlikely to improve the prediction accuracy of the resulting list of completion candidates; instead, they may cause more confusion for the user due to the arbitrariness in the identification and ranking of the completion candidates.

The present disclosure describes a technique that addresses the above needs and problems. As will be explained in more details later, in some embodiments, a respective word indexing power of each word in a corpus of known complete input phrases (e.g., all expressions of commands known to have been accepted by a digital assistant) is computed, given a set of underlying domains (e.g., all domains of service implemented by the digital assistant, such as the movie domain, restaurant search domain, e-mail domain, weather domain, news domain, messaging domain, telephone call domain, etc.). In addition, a respective phrase indexing power is computed for each complete input phrase based on an aggregation of the respective word indexing power of all constituent words of said complete input phrase. The phrase index power is a parameter for uniformly evaluating how representative a complete input phrase is with respect to each domain. Based on their respective phrase indexing powers, complete input phrases associated with different domains can be fairly and uniformly compared in terms of their likelihood of appearance in user inputs, despite of the cross-domain differences in usage data density (or sparsity) and availability of alternative expressions.

In addition, as described in the present disclosure, the phrase indexing power information is combined with historic usage frequency data to provide a more uniform way of evaluating actual usage of the complete input phrases, with reduced bias due to the particularities across different domains. In particular, the word indexing power and the phrase indexing power of the complete input phrases are based on normalized entropy of the words in the complete input phrases. Thus, the usage frequency information is “entropy-weighted,” and the entropy-weighted information can be used to generate a unified ranking for all complete input phrases known to the digital assistant.

From the list of all complete input phrases, completion candidates (e.g., auto-completions to be presented in response to a current user input) are identified by matching current user input to the initial portions of the complete input phrases. The ranking of the identified completion candidates is based on the unified ranking of their corresponding complete input phrases in the list of all complete input phrases.

In some embodiments, recency information is incorporated into the entropy-weighted frequency information for individual users, such that a consistent ranking among all completion phrases can be tailored for the individual users.

Once a list of sorted completion candidates are presented to the user and the user makes a selection from the list, the selected completion candidate is passed to the digital assistant and a task corresponding to the selected completion candidate is then executed by the digital assistant.

More details of the identification and unified ranking of completion candidates are provided below.

FIG. 1 is a block diagram illustrating an exemplary environment 100 in which an auto-completion system 102 operates in accordance with some embodiments. As shown in FIG. 1, the auto-completion system 102 is implemented as part of a digital assistant 104. In some embodiments, the digital assistant 104 is implemented in accordance with a client-server model, although such configuration is not required. In some embodiments, the auto-completion system 102 need not be a component of the digital assistant 104, and can be a standalone component in separate communication with the digital assistant 104 and a client device 106.

In the environment 100, the client device 106 captures a user input 108 (e.g., a speech input, a text input, or a combination of both) received from a user, e.g., using a microphone or keyboard coupled to the client device 106. If the user input 108 includes a speech input, the speech input is passed to a speech recognition system 110, which converts the speech input into a text string based on various techniques (e.g., various speech models, language models, and/or domain models). The speech recognition system 110 then passes the text string on to the auto-completion system 102 as input for the identification and ranking of auto-completion candidates for the user input 108. If the user input 108 includes a text input, the text input is passed directly to the auto-completion system 102 as input for the identification and ranking of auto-completion candidates.

As described in more details below, in some embodiments, the auto-completion system 102 maintains a database 112 of all complete input phrases (e.g., N distinct complete input phrases) known to the digital assistant. The auto-completion system 102 has uniformly ranked all of these complete input phrases in the database 112 in accordance with the methods described herein. When the auto-completion system 102 identifies from the database 112 two or more complete input phrases that match the current user input, the identified matching phrases are already sorted in accordance with their existing ranks within the database 112. In some embodiments, the auto-completion system 102 uses various criteria to further trim down and adjust the order of the identified matching phrases, before a ranked list of auto-completion candidates 114 are determined from the identified matching phrases.

In some embodiments, once the auto-completion system 102 has compiled the ranked list of auto-completion candidates 114 for the user input 108, the auto-completion system 102 sends the list 114 to the client device 106, where the list 114 is presented to the user for selection.

In some embodiments, if a selection input 116 is received from the user, and the selection input selects one of the auto-completion candidates presented to the user, the auto-completion system 102 determines whether the selected auto-completion candidate includes any placeholder or template words (e.g., the template word “Cuisine Type” in the auto-completion “Find <CUISINE TYPE> restaurant nearby”). If the selected auto-completion candidate includes any place-holder or template words, the auto-completion system 102 optionally provides to the user additional suggestions or auto-completions for the user to specify a particular instance of the template words (e.g., “Chinese” or “Italian” are a particular instances of the template word “Cuisine Type”). In some embodiments, the user optionally continues to type out the specific values of the template word(s) in the selected auto-completion. In some embodiments, a selected auto-completion having multiple template words may require more than one round of interactions (e.g., presentation and selection of auto-completions) with the user to be fully instantiated. In some embodiments, once the auto-completion system 102 determines that all (if any) template words in the selected auto-completion have been instantiated by the user, the auto-completion system 102 forwards the fully instantiated auto-completion to the task execution system 118 of the digital assistant.

In some embodiments, once the fully-instantiated auto-completion is passed onto the task execution system 118, the task execution engine 118 parses the auto-completion to determine whether additional dialogue with the user (e.g., to obtain additional parameters) is needed to carry out a corresponding task. If no further dialogue is necessary, the task execution system 118 proceeds to execute of the task in accordance with the parameters specified in the fully-instantiated auto-completion.

More details about how the digital assistant processes commands, carries on dialogues, and executes tasks are provided in Applicant's prior U.S. Utility application Ser. No. 12/987,982 for “Intelligent Automated Assistant,” filed Jan. 10, 2011, and U.S. Utility application Ser. No. 13/831,669 for “Crowd Sourcing Information to Fulfill User Requests,” filed Mar. 15, 2013, the entire disclosures of which are incorporated herein by reference.

FIG. 1 is merely illustrative, and other configurations of an operating environment for the auto-completion system 102 are possible in accordance with various embodiments. For example, although the auto-completion system 102 is shown as a part of a digital assistant server 104 in FIG. 1, in some embodiments, the auto-completion system 102 is a standalone system in communication with a digital assistant and/or a user device through one or more networks. In some embodiments, the auto-completion system 102 is a sub-system of a digital assistant client residing on the user device 106.

FIG. 2 is a block diagram of an exemplary auto-completion system 102 in accordance with some embodiments. The auto-completion system 102 includes an input processing module 202, a candidate identification module 204, a candidate presentation module 206, a selection processing module 208, and a command identification module 210. The input processing module 202, candidate identification module 204, candidate presentation module 206, selection processing module 208, and command identification module 210 are activated during real-time interaction with the user. As described herein, real-time refers to a timeframe within the current user session, during the normal back-and-forth flow of input and feedback between the user and the device.

In addition, the auto-completion system 102 further includes a command collection and unified ranking module 212. The command collection and unified ranking module 212 accesses and optionally manages a plurality of information sources, including a domain data store 214, a usage frequency data store 216, and a usage recency data store 218. The command collection and unified ranking module 212 also manages the database 112 of complete input phrases (e.g., a list of N complete input phrases that are known to be acceptable to the digital assistant), ranked according to the integrated ranking methods described herein.

In some embodiments, the domain data store 214 includes information on the degrees of association between particular words and concepts and different domains of service offered by the digital assistant. In some embodiments, the digital assistant (e.g., the digital assistant 104 in FIG. 1) implements an ontology of domains which organizes data (e.g., the vocabulary, tasks, dialogue models, task models, and service models) used in different domains of service in respective clusters of interconnected nodes representing concepts germane to the different domains of service. Different domains may overlap in vocabulary and concepts used to express user commands for invoking particular tasks in the respective domains. In some embodiments, the digital assistant maintains a list of acceptable commands or complete input phrases for each domain of service or task. For example, in the restaurant domain, input phrases such as “Find a restaurant nearby,” “Find me a place to eat,” and “Search for a good pizza joint,” are all acceptable commands or input expressions sufficient to cause the digital assistant to execute a restaurant search task. Similarly, in the telephone call domain, input phrases such as “Call mom,” “Telephone Jason,” and “Call xxx-xxx-xxxx” are all acceptable commands or complete input phrases sufficient to cause the digital assistant to execute a telephone call function on behalf of the user to a specified callee. Depending on how popular or well-known a particular domain of service is, some domains of service will be more frequently invoked than others. In addition, as more than one alternative expression may be used invoke a particular task in a particular domain; some of the alternative expressions may be more frequently invoked than other expressions in the particular domain. In some embodiments, the domains, and the known commands and input expressions for each domain and each task, are stored in the domain data store 214. In some embodiments, the command collection and unified ranking module 212 manages the domain data store 214. In some embodiments, the digital assistant manages the domain data store 214 and provides the command collection and unified ranking module 212 with access to the domain data store 214.

In some embodiments, the command collection and unified ranking module 212 logs and/or processes usage frequency information and usage recency information from the interactions between user(s) and the digital assistant (e.g., the digital assistant 104 in FIG. 1). For example, in some embodiments, the command collection and unified ranking module 212 aggregates how frequently and/or how many times each user has entered a particular input phrase that has been correctly interpreted by the digital assistant and produced a satisfactory response to the user. In some embodiments, the command collection and unified ranking module 212 tags the usage frequency information with various context information (e.g., time, location, user, language, etc.) collected at the time when particular input phrases were submitted to the digital assistant. In some embodiments, the command collection and unified ranking module 212 uses the context tags to determine the respective usage frequency information under different context (e.g., time, location, user, language, etc.). In some embodiments, the usage frequency under a particular context can be used to provide a boost for particular input phrases when determining whether to present the input phrase as an auto-completion candidate. In some embodiments, the command collection and unified ranking module 212 stores the usage frequency information in a usage frequency data store 216. In some embodiments, the digital assistant manages the usage frequency data store 216 and provides the command collection and unified ranking module 212 with access to the usage frequency data store 216.

Similarly, the command collection and unified ranking unit 212 keeps track of the time when each particular input phrase has last been used to make a request to the digital assistant. In some embodiments, a recency score is assigned to each known input phrase, and the score for each known input phrase is automatically decreased overtime if the phrase is not reused by a user, and refreshed to its full value if it is reused by the user. Other methods of keeping the recency information are possible. In some embodiments, the command collection and unified ranking unit 212 stores the usage recency information in a usage recency data store 218. In some embodiments, the digital assistant manages the usage recency data store 218 and provides the command collection and unified ranking module 212 with access to the usage recency data store 218.

In some embodiments, the command collection and unified ranking unit 212 processes the information in the domain data store 214, the usage frequency data store 216, and the usage recent data store 218 to score and rank all complete input phrases collected during past interactions between the user and the digital assistant. In some embodiments, the command collection and unified ranking unit 212 generates a ranked list of all known complete input phrases across all domains implemented by the digital assistant. The ranked list of all known complete input phrases is stored in the database 112, and can be retrieved based on the user input currently received from the user.

In some embodiments, the command collection and unified ranking module 212 operates on the backend to continuous or periodically improve and update the ranked list of complete input phrases, which is utilized by the candidate identification module 204 to identify and rank auto-completion candidates for a particular partial user input currently received from a user.

In some embodiments, the list of complete input phrases in the database 112 include explicit input phrases containing only normal words (i.e., static, invariable words, words that are not altered or replaced during use), such as input phrases “Turn on the media player” and “Show location on map.” In some embodiments, the database 112 further includes “template input phrases” which contain both normal words and template words (i.e., words that serve as a generalization of a respective category of normal words). For example, a template input phrase “Call <Callee>” includes a normal word “call” and a template word “callee,” where the template word “callee” represents a category of normal words that each specify a respective callee of a telephone call. Examples of the normal words that instantiate the template word “callee” include, “mom,” “my dentist,” “Mr. Bond,” “911,” “emergency,” “the office,” etc. Sometimes, a template input phrase may include more than one template words. For example, in the template input phrase “Make a reservation for <party size> at <restaurant name> for <time>” includes both normal words (e.g., “make,” “a,” “reservation,” “for,” and “at”) and template words (e.g., “party size,” “restaurant name,” and “time”). In some embodiments, template input phrases are derived based on repeated patterns present in multiple explicit input phrases found in the interaction log of the digital assistant. Therefore, more than one explicit input phrases may be instantiated from a template input phrase by using different normal words to substitute for the template words. In some embodiments, by including template input phrases in the database 112, the applicability of the auto-completion system 102 may be broadened without significantly increasing the storage requirement for the database 112. In general, a template input phrase (e.g., “Call <callee>”) is more representative than a particular instantiation (e.g., “Call mom.”) of the template input phrase. Similarly, a template word (e.g., “<callee>”) is more representative than a particular normal word (e.g., “mom” or “Mr. Bond.”) that is an instance of the template word.

In some embodiments, after the ranked list of all known complete input phrases have been prepared according to the methods described herein, the ranked list can be used to provide candidate completions to the user in accordance with the current user input.

In particular, the input processing module 202 receives a user input (e.g., a text input, a speech input, and optionally other multi-modal inputs and context information) directed to the digital assistant. The input processing module 202 determines a text string from the received user input (e.g., using speech-to-text or just the text that has been entered), and forwards the text string to the candidate identification unit 204. The candidate identification unit 204 accesses the ranked list of all known complete input phrases in the database 112 to identify a subset of complete input phrases that match the text string. For example, if the user input is converted to a text string “Find,” the candidate identification unit 204 generates a database query to the database 112 and retrieves all complete input phrases that start with the word “Find,” such as “Find restaurants . . . ” “Find gas stations,” “Find a pizza place . . . ” “Find a doctor . . . ”, etc.

In some embodiments, the candidate identification unit 204 does not require an exact match between the text input and the beginning of the complete input phrases to select the subset of complete input phrases. For example, in some embodiments, the candidate identification unit 204 treats the input text string (“Find”) as a keyword, and as long as an input phrase (e.g., “Where can I find a place to eat,” “Search for restaurants,” “Look for the next rest stop,” etc.) sufficiently matches the keyword (e.g., in terms of underlying semantics), the input phrase is optionally included in the subset of complete input phrases from which auto-completion candidates for the user input is then identified. In some embodiments, the candidate selection module 204 only uses fuzzy or partial matching if exact matching does not produce a sufficient number of complete input phrases from which to identify the auto-completion candidates.

Once the candidate identification module 204 has obtained the subset of complete input phrases that match the input text string received from the input processing module 202, the candidate identification module 204 optionally processes the subset of complete input phrases to filter out near-duplicates (e.g., “Find Chinese restaurants in Palo Alto” and “Find restaurants in Palo Alto that serve Chinese food”). In some embodiments, the candidate selection module 204 selects only a predetermined number (e.g., no more than four) of input completions from each domain represented in the subset of matching complete input phrases. Choosing only a limited number of candidates from each domain ensures that a diversity of domains can be represented in a reasonable number of auto-completion candidates ultimately presented to the user. In some embodiments, the candidate selection module 204 uses other criteria to further limit or expand the list of auto-completion candidates selected from the subset of matching complete input phrases. Since the list of all known complete input phrases in the database 112 are already ranked, the ranking of the auto-completion candidates are automatically available based on the ranking of the subset of matching complete input phrases.

In some embodiments, once the candidate selection module 204 has identified the list of auto-completion candidates for the current input, the candidate selection module 204 passes the identified list of auto-completion candidates to the candidate presentation module 206. The candidate presentation module 206 formats the auto-completion candidates in accordance with various formats suitable for the current user interface. In some embodiments, the auto-completion candidates are presented in an expanded drop-down list that emanates from the text .input area in which the user has typed the current user input. In some embodiments, the auto-completion candidates are presented in a window of a graphical user interface of the digital assistant. In some embodiments, the auto-completion candidates are presented below a textual echo of a speech input currently received from the user.

In some embodiments, the user may continue to speak or type additional characters and words to the digital assistant without making any selection from the list of auto-completion candidates currently presented to the user. In some embodiments, the list of auto-completion candidates presented to the user is dynamically updated as the user adds each additional character or word to the current input. Each time the user adds an additional character or word to the current input: the input processing module 202 captures and passes the added character or word to the candidate identification module 204. The candidate identification module 204 filters the subset of matching complete input phrases such that only input phrases matching the current input (i.e., the original input followed by the added character) are preserved. In some embodiments, the candidate selection unit 204 performs additional steps to ensure diversity and size of the candidate list as described above. For example, a previously identified auto-completion candidate may be eliminated because it no longer matches the current input. Similarly, a previously low-ranked input phrase in the subset of complete input phrases may be selected as an auto-completion candidate because its relative rank within the subset has improved due to elimination of some other input phrases from the subset of input phrases.

Once the updated list of auto-completion candidates have been identified, the candidate identification module 204 sends the updated list of auto-completion candidates to the candidate presentation module 206, which in turn presents the updated list of auto-completion candidates to the user.

In some embodiments, if the user selects one of the auto-completion candidates presented to the user, the user selection is captured by the selection processing module 208. The selection processing module 208 determines whether the selected auto-completion candidate still includes any un-instantiated template word. For example, an auto-completion candidate “Find a restaurant near me” does not include any un-instantiated template word and is a complete input command to the digital assistant. In contrast, “Make me a reservation at <Restaurant Name>” includes an un-instantiated template word “<Restaurant Name>.” If the selected auto-completion candidate does not include any un-instantiated template word, the selection processing module 208 forwards the selected auto-completion candidate to the command transmission module 210, and the command transmission module 210 then forwards the selected auto-completion candidate to the digital assistant as a user command. If the selected auto-completion candidate includes at least one un-instantiated template word, the selection processing module 208 optionally sends the selected auto-completion candidate to the candidate identification module 204, where the candidate identification module 204 identifies one or more normal words corresponding to the template word in the selected auto-completion candidate, and the candidate presentation module 206 presents the one or more normal words identified by the candidate identification module 204 as suggestions for the template word. In some embodiments, the candidate identification module uses the user's personal long term information, such as the user's contacts, applications, emails, and documents, current location, etc., as sources of information for suggesting the normal words for instantiating the template word.

In some embodiments, if an auto-completion candidate includes multiple template words and is divided into multiple parts by the template words, only part of the multi-part completion candidate is presented at a time. Later parts of the multi-part auto-completion candidate are each presented to the user when an early part has been selected and fully instantiated with normal words. For example, for the multi-part completion candidate “Make me a reservation at <Restaurant Name> for <Date>, <Time> for <Party Size>,” only the initial part “Make me a reservation at <Restaurant Name>” is presented in response to the user input “Make.” Once this auto-completion has been selected, and the user has entered a restaurant name (e.g., through explicit typing or selection of an auto-suggestion) for the template word “<Restaurant Name>,” the subsequent part “for <Date>” is presented to the user. The process of receiving user input instantiating (i.e., specifying the normal words for) the template word and presenting the subsequent part of the multi-part auto-completion continues until all parts of the multi-part auto-completion has been presented and all template words have been fully instantiated. Once the selection processing module 208 determines that the selected auto-completion no longer has any un-instantiated template word, the selection processing unit 208 passes the fully-instantiated multi-part auto-completion to the command transmission module 210. The command transmission module 210 then causes the digital assistant to execute a task in accordance with the fully-instantiated multi-part auto-completion.

FIG. 2 is illustrative of the functions and components of an exemplary auto-completion system 102. As a person skilled in the art would recognize, the auto-completion system may include more or fewer components than those shown in FIG. 2. Some functions described above may be implemented by components other than those shown in FIG. 2. Some functions and modules may be combined into fewer modules or further divided into other sub-functions and sub-modules.

FIG. 3 is a block diagram of a system 300 that provides entropy-weighted phrase-level auto-completions in accordance with some embodiments. The system 300 includes one or more processing units (or “processors”) 302, memory 304, an input/output (I/O) interface 306, and a network communications interface 308. These components communicate with one another over one or more communication buses or signal lines 310. In some embodiments, the memory 304, or the computer readable storage media of memory 304, stores programs, modules, instructions, and data structures including all or a subset of: an operating system 312, an I/O module 314, a communication module 316, and a digital assistant system 104. The one or more processors 302 are coupled to the memory 304 and operable to execute these programs, modules, and instructions, and reads/writes from/to the data structures.

In some embodiments, the processing units 302 include one or more microprocessors, such as a single core or multi-core microprocessor. In some embodiments, the processing units 302 include one or more general purpose processors. In some embodiments, the processing units 302 include one or more special purpose processors. In some embodiments, the processing units 302 include one or more personal computers, mobile devices, handheld computers, tablet computers, or one of a wide variety of hardware platforms that contain one or more processing units and run on various operating systems.

In some embodiments, the memory 304 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices. In some embodiments the memory 304 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. In some embodiments, the memory 304 includes one or more storage devices remotely located from the processing units 302. The memory 304, or alternately the non-volatile memory device(s) within the memory 304, comprises a computer readable storage medium.

In some embodiments, the I/O interface 306 couples input/output devices, such as displays, a keyboards, touch screens, speakers, and microphones, to the I/O module 314 of the auto-completion system 300. The I/O interface 306, in conjunction with the I/O module 314, receive user inputs (e.g., voice input, keyboard inputs, touch inputs, etc.) and process them accordingly. The I/O interface 306 and the user interface module 314 also present outputs (e.g., sounds, images, text, etc.) to the user according to various program instructions implemented on the auto-completion system 300.

In some embodiments, the network communications interface 308 includes wired communication port(s) and/or wireless transmission and reception circuitry. The wired communication port(s) receive and send communication signals via one or more wired interfaces, e.g., Ethernet, Universal Serial Bus (USB), FIREWIRE, etc. The wireless circuitry receives and sends RF signals and/or optical signals from/to communications networks and other communications devices. The wireless communications may use any of a plurality of communications standards, protocols and technologies, such as GSM, EDGE, CDMA, TDMA, Bluetooth, Wi-Fi, VoIP, Wi-MAX, or any other suitable communication protocol. The network communications interface 308 enables communication between the system 300 with networks, such as the Internet, an intranet and/or a wireless network, such as a cellular telephone network, a wireless local area network (LAN) and/or a metropolitan area network (MAN), and other devices. The communications module 316 facilitates communications between the system 300 and other devices over the network communications interface 308.

In some embodiments, the operating system 302 (e.g., Darwin, RTXC, LINUX, UNIX, OS X, WINDOWS, or an embedded operating system such as VxWorks) includes various software components and/or drivers for controlling and managing general system tasks (e.g., memory management, storage device control, power management, etc.) and facilitates communications between various hardware, firmware, and software components.

In some embodiments, the system 300 is implemented on a standalone computer system. In some embodiments, the system 300 is distributed across multiple computers. In some embodiments, some of the modules and functions of the system 300 are divided into a server portion and a client portion, where the client portion resides on a user device and communicates with the server portion residing one a server device through one or more networks. It should be noted that the system 300 is only one example of an auto-completion system, and that the system 300 may have more or fewer components than shown, may combine two or more components, or may have a different configuration or arrangement of the components. The various components shown in FIG. 3 may be implemented in hardware, software, firmware, including one or more signal processing and/or application specific integrated circuits, or a combination of thereof.

As shown in FIG. 3, the system 300 stores the auto-completion module 102 in the memory 304. In some embodiments, the auto-completion module 102 further includes the followings sub-modules, or a subset or superset thereof: the input processing module 202, the candidate identification module 204, the candidate presentation module 206, the selection processing module 208, the command transmission module 210, and the command collection and unified ranking module 212. In addition, each of these sub-modules has access to one or more of the following data structures and data sources of the auto-completion module 102, or a subset or superset thereof: the database 112 containing the unified ranking of all known complete input phrases, the domain data store 214, the usage frequency data store 216, and the usage recency data store 218. More details on the structures, functions, and interactions of the sub-modules and data structures of the auto-completion module 102 are provided with respect to FIGS. 1-2 and 4A-4C, and accompanying descriptions.

FIGS. 4A-4C are flow charts of an exemplary process 400 for providing auto-completions for a user input directed to a digital assistant. The process 400 is implemented by an auto-completion system (e.g., the auto-completion system 102 or the device 300 in FIGS. 1-2 and 3) in accordance with some embodiments. In some embodiments, the method described below may be performed to provide cross-domain semantic ranking of input phrases for other context and applications.

In the exemplary process 400, an auto-completion module receives (402) a training corpus including a collection of input phrases. The collection of complete input phrases spans multiple semantically distinct domains (e.g., K semantically distinct domains) and includes a plurality of distinct complete input phrases (e.g., N distinct complete input phrases) comprising plurality of distinct words (e.g., M distinct words). The numbers N, K, and M are natural numbers greater than one. In some embodiments, the collection of input phrases include all known input phrases that have been used by one or more users to request performance of task by the digital assistant. Examples of the input phrases include “Call mom,” “Create a calendar invite for my birthday party,” “Make a reservation for Saturday night at the Rock Café,” and “E-mail this webpage to me,” etc. These input phrases in the training corpus spans multiple domains of services implemented by the digital assistant. For example, the input phrase “Call mom” is in a telephone call domain and relates to a task of making a telephone call to a specified callee; the input phrase “Create a calendar invite for my birthday party” is in a “Calendar” domain and relates to a task of creating a calendar invite; and the input phase “Make a reservation for Saturday night at the Rock Café” is in the “Restaurant” domain and relates to a task of making a restaurant reservation. In some embodiments, the collection of input phrases are based on actual input phrases recorded in the usage data log of the digital assistant, and are representative of the actual relative usage frequencies of each unique input phrase during the historic interactions between users and the digital assistant. In some words, some input phrases such as “Call back” may appear many times in the training corpus, while the input phrase “Turn off the phone at 6 pm everyday” may appear only a few times in the training corpus.

In some embodiments, the collection of input phrases optionally includes, for each task in each domain, different expressions of the same request. For example, the input phrase “Call mom” and the input phrase “Telephone my mother” are different expressions for the same request to the digital assistant to initiate a telephone call to a telephone number associated with the user's mother.

In some embodiments, the collection of input phrases also includes template input phrases for similar requests of the same type. An example template input phrase “Make a reservation at <Entity Name> for <Reservation Date> at <Time> for <Party Size>” includes invariable parts (e.g., shown as underlined text within the quotes) and variable parts (e.g., shown as text within angular brackets). The invariable parts of a template input phrase consist of normal words. The variable part(s) of a template input phrase are each represented by a respective template word or variable name, such as “<Entity Name>,” “<Reservation Date>,” “<Time>,” and “<Party Size>.” When matching a current input received from a user to a known input phrase found in the training corpus, a normal word in the known input phrase is matched by comparing the corresponding sequences of characters in the current user input and the normal words of the known input phrase. On the other hand, a template word of the known input phrase is matched by determining whether a sequence of word(s) in the current user input qualifies as an instance of the template word. For example, the sequence of words “the Rock Café” is an instance of the template word “Entity Name” because there is a restaurant entity having the name “Rock Café.” In another example, the word “Saturday” is an instance of the template word “Reservation Date.”

In some embodiments, to gather the collection of complete input phrases for the training corpus, the auto-completion module collects (404) a plurality of user input phrases from a usage log of the digital assistant. The auto-completion module further identifies (406) at least one template input phrase based on a common phrase pattern present in two or more of the plurality of user input phrases. In some embodiments, the auto-completion module normalizes (408) the plurality of user input phrases by substituting at least one word in each of said two or more user input phrases with a respective template word representing a generalization of the at least one word. For example, in some embodiments, three different input phrases “Call mom,” “Call Mr. Bond,” and “Call my insurance agent” are optionally included in the training corpus as distinct input phrases. Alternatively, in some embodiments, the auto-completion module identifies a common phrase pattern present in these three distinct input phrases, and normalizes them into a single template input phrase “Call <Callee>.” The template word “Callee” is used to replace the respective words “mom,” “Mr. Bond,” and “my agent” in the three distinct input phrases. In some embodiments, the template input phrase is included in the training corpus in place of the three distinct input phrases from which the template input phrase has been abstracted. In some embodiments, the usage frequency and usage recency data attributable to each of the three distinct input phrases are aggregated and attributed to the template input phrase. Therefore, using template input phrases in place of more specific input phrases in the training corpus reduces data sparsity issues for some input phrases in some domains. In addition, the use of template input phrases also reduces the number of input phrases that need to be stored in the training corpus. In some embodiments, both the template input phrases and at least some of the more specific input phrases from which the template phrases have been abstracted are included in the training corpus. For example, if there are sufficient number of the input phrase “call mom” in the usage data log, the normal input phrase “call mom” is optionally included in the training corpus along with the template input phrase “Call <Callee>.”

In some embodiments, in the process 400, for each of the M distinct words (e.g., the word “call”) in the collection of complete input phases, the auto-completion module calculates (410) a respective word indexing power across the K domains of service based on a respective normalized entropy for said word (e.g., the word “call”). In some embodiments, the respective normalized entropy of a word (e.g., the word “call”) is based on a total number of domains in which said word (e.g., the word “call”) appears and how representative said word is for each of the K domains. For example, in a typical training corpus, the word “call” would be highly representative for the “telephone call” domain, but less representative for the “directions” domain, even though the word call may have appeared in one or more input expressions of both domains (e.g., in “Call mom” for the telephone call domain, and “Find the directions to the customer call center” for the directions domain).

In some embodiments, to calculate the respective word indexing power across the K domains of service, the auto-completion module calculates (412) the respective normalized entropy for each word based on a respective total number of times said word occurs in each domain, and a total number of times said word occurs in the collection of complete input phrases. In some embodiments, to calculate the respective word indexing power across the K domains of service for each word wi of the M distinct words, the auto-completion module calculates (406) the respective normalized entropy εi for the word wi based on a respective formula

ɛ i = - 1 log K k = 1 K c i , k t i log c i , k t i ,

where ci,k is a total number of times wi occurs in a domain dk of the K domains of service, and ti=Σci,k is a total number of times wi occurs in the collection of complete input phrases, and wherein the respective word indexing power of the word wi is (1−εi). By definition, the respective normalized entropy εi for the word wi is greater than or equal to zero, and less than or equal to unity. The respective normalized entropy εi for the word wi is equal to zero if and only if ci,k (i.e., the total number of times wi occurs in a domain dk) is equal to ti. A value of εi approaching zero indicates that the word wi is essentially present in only one domain. In other words, a value of εi approaching zero indicates that the word wi is very representative of the domain(s) in which the word wi is present. The respective normalized entropy εi for the word wi is equal to unity if and only if ci,k (i.e., the total number of times wi occurs in a domain dk) is equal to ti/K. A value of εi approaching unity indicates that the word wi is distributed across the K domains. In other words, a value of εi approaching unity indicates that the word wi is not representative in any of the domains in which the word wi is present. The value 1−εi is therefore a measure of the indexing power of the word wi across the K domains.

In some embodiments, for each of the N distinct complete input phrases, the auto-completion module further calculates (414) a respective phrase indexing power across the K domains of based on an aggregation of the respective word indexing powers of all constituent words of said complete input phrase.

In some embodiments, to calculate the respective phrase indexing power across the K domains of service for each input phrase Pj of the N distinct complete input phrases, the auto-completion module distinguishes (416) template words from normal words in the input phrase Pj. The auto-completion module then calculates (418) the respective phrase indexing power for the complete input phrase based on a total number of normal words present in the complete input phrase and a total number of template words present in the complete input phrase, where the total number of template words is given a respective template bias multiplier in the calculation of the respective phrase indexing power. For example, for the input phrase “Schedule meeting with <Person> on <Day>,” the respective normalized entropies for the normal words (e.g., “schedule,” “meeting,” “with,” “on”) and the respective normalized entropies for the template words (e.g., “<Person>” and “<Day>”) are accumulated differently. Specifically, a template bias is added to phrase indexing power of the input phrase for each template word present in the input phrase. In other words, the phrase indexing power is biased to favor input phrases having a large number of template words versus input phrase having fewer or no template words.

In some embodiments, the auto-completion module calculates the respective phrase indexing power μj for the input phrase Pj based on a respective formula

μ j = b T n T ( j ) + 1 n N ( j ) + n T ( j ) [ i = 1 n N ( j ) ( 1 - ɛ i ) + i = 1 n T ( j ) ( 1 - ɛ i ) ] ,

where nN(j) is a total number of normal words present in the input phrase Pj, nrj) is a total number of template words present in the input phrase Pj, (1−εi) is the respective word indexing power of each word wi, and bT is a respective template bias multiplier used to calculate the weight bias bTnr(j) for the input phrase Pj. In some embodiments, the respective template bias multiplier is a positive real number. The template bias multiplier controls the amount of template bias to add to the average of the word indexing powers in the input phrase. In some embodiments, the auto-completion module adjusts (420) the respective template bias multiplier based on a performance evaluation of the cross-domain ranking in providing auto-completion candidates for user input.

In the process 400, the auto-completion module further integrates the respective phrase indexing powers of the complete input phrases with respective usage frequencies of the complete input phrases, where both the phrase indexing powers and the usage frequencies have been normalized across the K domains for cross-domain comparison. In some embodiments, in the process 400, the auto-completion module obtains (422) the respective domain-specific usage frequencies of the N complete input phrases in the training corpus. The auto-completion module generates (424) a cross-domain ranking of the N complete input phrases based at least on the respective phrase indexing powers of the N complete input phrases and the respective domain-specific usage frequencies of the N complete input phrases.

In some embodiments, for each input phrase Pj, the auto-completion module normalizes (426) the respective domain-specific frequency of said input phrase Pj by a maximum phrase count of a single input phrase observed in the training corpus. Specifically, a phrase count qj of the input phrase Pj is normalized by the maximum phrase count of a single input phrase observed in the training corpus, i.e.,

max l d k q j ,

to obtain normalized frequency

v j = q j max l d k q j .

For example, suppose that the input phrase “Call back” has occurred twenty
thousand times in the training corpus. Suppose that the maximum phrase count of a single input phrase observed in the training corpus is twenty-five thousand times, which is the respective phrase count of the most popular input phrase “Call <Callee>” observed in the training corpus. Therefore, the respective usage frequency of the input phrase “Call back” is normalized by dividing twenty thousand by twenty five thousand. In fact, the respective usage frequencies of all input phrases are each normalized by dividing the respective phrase count of the input phrase by the maximum count twenty-five thousand observed in the training corpus. After normalization, the usage frequency of an input phrase is ready to be combined with the respective phrase indexing power of the input phrase for cross-domain comparisons with other input phrases.

In some embodiments, when generating the cross-domain ranking of the N distinct complete input phrases, the auto-completion module calculates (428) a respective integrated ranking score for each of the N complete input phrases based on the respective normalized frequency of said complete input phrase and the respective phrase indexing power of said complete input phrase. Then, the auto-completion module further generates (430) the cross-domain ranking of the N complete input phrases based on the respective integrated ranking scores for the N complete input phrases.

In some embodiments, when calculating the respective integrated ranking score R1 for each input phrase Pj, the auto-completion module calculates the respective integrated ranking score R1 for the input phrase Pj, based on a respective formula

R j = ω v v j + ω μ μ j ω v + ω μ ,

where vj is a respective frequency of the input phrase Pj that has been normalized for cross-domain comparison, μj is a respective phrase indexing power of the input phrase Pj, across the K domains of service, ωv and ωμ are relative weights given to domain-specific frequency and cross-domain indexing power in the ranking of the N complete input phrases. The values of ωv and ωμ reflect the relative importance given to frequency and indexing power information, respectively.

In some embodiments, the cross-domain ranking of the N complete input phrases is used to provide auto-completion candidates to all users. In some embodiments, user-specific versions of the unified cross-domain ranking of the N complete input phrases are created by adding user specific recency information to the ranking. For example, in some embodiments, if a particular user only uses certain input phrases for a few domains of service, those input phrases would appear in the recent interaction history of the particular user. In some embodiments, the ranking of each of these recently used input phrases can be boosted in the unified ranking of all input phrases to create a user-specific version of the ranking for providing auto-completion candidates for the particular user. In some embodiments, the boost based on user-specific recency information is applied directly to the rank of those input phrases that have appeared in the recent input history of the user. In some embodiments, the boost based on user-specific recency information is applied as a multiplier to the usage frequencies (or phrase counts) associated with the recently used input phrases. In some embodiments, the auto-completion module updates (432) the respective domain-specific frequency of each input phrase Pj, based on a user-specific recency bias bR, where the user-specific recency bias is based on a number of times that a particular user has used the input phrase Pj (e.g., to communicate with the digital assistant) in a recent time window. In some embodiments, the phrase count qj is incremented to qi(1+bR) before the phrase count is normalized by the maximum phrase count of a single phrase observed in the training corpus.

In some embodiments, the above steps of the process 400 are performed to generate the ranked list of all known complete input phrases, which is subsequently used to present (434) one or more auto-completions in response to an initial user input. When presenting auto-completions based on the current user input and the ranked list of all known complete input phrases, the following steps are performed by the auto-completion module.

In some embodiments, the auto-completion module receives (436) an initial user input from a user. In some embodiments, the auto-completion module identifies (438), from the N distinct complete input phrases, a subset of complete input phrases that each begins with the initial user input. For example, for the initial user input “Ma,” the subset of complete input phrases may include “Make a reservation at <Restaurant Name> for <Party Size> on <Day>” “Make a left turn on Broadway,” “Match font size throughout,” “Make a call to <Callee>” and “Maximize window.”

In some embodiments, when determining whether an input phrase in the training corpus begins with the initial user input, the auto-completion module matches the text of the initial user input with the text of the N distinct input phrases. In some embodiments, for normal text in the complete input phrase, exact match to the initial input text is required. In some embodiments, for template words in the complete input phrase, a match is declared as long as the characters and/or words in the initial input text qualifies as an instance of the template word in the complete input phrase. In some embodiments, once the subset of complete input phrases are identified, the auto-completion module ranks (440) the subset of complete input phrases in accordance with the cross-domain ranking of the N complete input phrases. For example,

Rank in the training Input Phrase corpus Rank in the subset Make a reservation at 20 2 <Restaurant Name> for <Party Size> on <Day> Make a left turn on 52 3 Broadway Match font size throughout 67 4 Make a call to <Callee> 12 1 Maximize window 102 5

In some embodiments, the auto-completion module selects (442) a predetermined number of unique relevant domains based on respective domains associated each of the plurality of candidate phrases and respective rank of the candidate phrases associated with the unique relevant domains. In some embodiments, the auto-completion module selects (444) at least a top-ranked candidate phrase from each of the unique relevant domains as one of the auto-completion candidates to be presented to the user. In the above example, four domains (e.g., restaurant reservation domain, directions domain, document editing domain, and telephone call domain) are represented in the subset of input phrases. In the document editing domain, two input phrases (e.g., “Match font size throughout” and “‘Maximize window”) match the current input phrase. Suppose that in some embodiments, the auto-completion module only presents completion candidates from three top-ranked domains. In the above example, only the top-ranked phrases from the telephone call domain, the restaurant reservation domain, and the driving domain are identified as completion candidates for the current user input. Selecting a top candidate phrase from each domain improves the diversity in the candidate completions presented to the user. In some embodiments, one input phrase from each domain represented in the subset is identified as a completion candidate. In the above example, all input phrases in the subset except for “Maximize window” are identified as completion candidates for the current input “Ma.” However, if the user does not select any of the presented completion candidates, but instead, continues to type “Maxim,” the presented input phrases would be eliminated from the subset because they no longer match the current user input, and the input phrase “Maximize window” will be identified as the only completion candidate for the current user input.

In some embodiments, the auto-completion module presents template input phrases in a manner different from that for presenting normal input phrases. In particular, text of a normal input phrase is presented to a user as a completion candidate all at once, while text of a template input phrase is optionally presented to the user in several installments. In some embodiments, when presenting one or more auto-completion candidates, the auto-completion module displays (446) a first portion of a first auto-completion candidate (e.g., a template input phrase) of the one or more auto-completion candidate, where the first portion of the first auto-completion candidate precedes a respective template word in the first auto-completion candidate. The auto-completion module then receives (448) a subsequent user input specifying one or more normal words corresponding to the respective template word. In some embodiments, the auto-completion module displays (450) one or more suggestions corresponding to the respective template word based on a user-specific vocabulary comprising at least one of a plurality of proper nouns (e.g., person names, entity names, and application names) associated with the user, and where the subsequent user input is a selection of the one or more displayed suggestions. In some embodiments, the template word is presented to the user as a prompt (e.g., in a different color as the normal words), and the user optionally types over the template word to enter the normal words that instantiate the template word.

In response to receiving the subsequent user input, the auto-completion module displays (452) a second portion of the first auto-completion candidate succeeding the respective template word in the first auto-completion candidate. In some embodiments, the presentation of subsequent portions of the first auto-completion candidate and the receipt of subsequent user input instantiating each template word in the first auto-completion candidate continues until all template words have been instantiated by respective normal words. Once the first auto-completion candidate has been fully instantiated, the auto-completion module forwards the first auto-completion candidate to the digital assistant as a user command.

The above process is merely illustrative. Not all steps are necessary. Unless explicitly specified, the steps may be executed in a different order than described above. A person of ordinary skills would recognize some variations to the above steps are possible, and would be within the scope of the current disclosure.

Claims

1. A method of providing cross-domain semantic ranking of complete input phrases for a digital assistant, comprising:

receiving a training corpus comprising a collection of complete input phrases that span a plurality of semantically distinct domains;
for each of a plurality of distinct words present in the collection of complete input phrases, calculating a respective word indexing power across the plurality of domains based on a respective normalized entropy for said word, wherein the respective normalized entropy is based on a total number of domains in which said word appears and how representative said word is for each of the plurality of domains;
for each complete input phrase in the collection of complete input phrases, calculating a respective phrase indexing power across the plurality of domains based on an aggregation of the respective word indexing powers of all constituent words of said complete input phrase;
obtaining respective domain-specific usage frequencies of the complete input phrases in the training corpus; and
generating a cross-domain ranking of the collection of complete input phrases based at least on the respective phrase indexing powers of the complete input phrases and the respective domain-specific usage frequencies of the complete input phrases.

2. The method of claim 1, further comprising:

providing the cross-domain ranking of the collection of complete input phrases to a user device, wherein the user device presents one or more auto-completion candidates in response to an initial user input in accordance with at least the cross-domain ranking of the collection of complete input phrases.

3. The method of claim 1, wherein calculating the respective word indexing power across the plurality of domains for each word wi of the plurality of distinct words further comprises: ɛ i = - 1 log   K  ∑ k = 1 K   c i, k t i  log  c i, k t i,

calculating the respective normalized entropy εi; for the word wi based on a respective formula
 wherein K is a total number of domains in the plurality of domains, ci,k is a total number of times wi occurs in a domain dk of the plurality of domains, and ti=Σkci,k is a total number of times wi occurs in the collection of complete input phrases, and wherein the respective word indexing power of the word wi is (1−εi).

4. The method of claim 1, wherein calculating the respective phrase indexing power across the plurality of domains for each complete input phrase Pj of the collection of complete input phrases further comprises: μ j  = b T  n T  ( j ) + 1 n N  ( j ) + n T  ( j )  [ ∑ i = 1 n N  ( j )   ( 1 - ɛ i ) + ∑ i = 1 n T  ( j )   ( 1 - ɛ i ) ],

distinguishing template words from normal words in the complete input phrase Pj, a template word being a word that is used to represent a respective category of normal words in a particular complete input phrase and that is substituted by one or more normal words when provided as an input to the digital assistant by a user;
calculating the respective phrase indexing power μj for the complete input phrase Pj based on a respective formula
 wherein nN(j) is a total number of normal words present in the complete input phrase Pj, nT(j) is a total number of template words present in the complete input phrase Pj, (1−εi) is the respective word indexing power of each word wi, and bT is a respective template bias multiplier used to calculate the weight bias bTnT(j) for the input phrase Pj.

5. The method of claim 1, wherein generating the cross-domain ranking of the collection of complete input phrases further comprises: R j = ω v  v j + ω μ  μ j ω v + ω μ,

calculating a respective integrated ranking score Rj for each complete input phrase Pj of the collection of complete input phrases based on a respective formula
 wherein vj is a respective frequency of the complete input phrase Pj that has been normalized for cross-domain comparison, ui is a respective phrase indexing power of the complete input phrase Pj across the plurality of domains, ωv and ωμ are relative weights given to domain-specific frequency and cross-domain indexing power in the ranking of the collection of complete input phrases; and
generating the cross-domain ranking of the collection of complete input phrases based on the respective integrated ranking scores for the collection of complete input phrases.

6. The method of claim 1, further comprising:

receiving the initial user input from a user;
identifying, from the collection of complete input phrases, a subset of complete input phrases that each begins with the initial user input;
ranking the subset of complete input phrases in accordance with the cross-domain ranking of the collection of complete input phrases;
selecting a predetermined number of unique relevant domains based on respective domains associated with each of the subset of complete input phrases; and
selecting at least one top-ranked input phrase from each of the unique relevant domains as one of the auto-completion candidates to be presented to the user.

7. The method of claim 1, wherein presenting one or more auto-completion candidates further comprises:

displaying a first portion of a first auto-completion candidate of the one or more auto-completion candidates, wherein the first portion of the first auto-completion candidate precedes or ends at a respective template word in the first auto-completion candidate;
receiving a subsequent user input specifying one or more normal words corresponding to the respective template word; and
in response to receiving the subsequent user input, displaying a second portion of the first auto-completion candidate succeeding the respective template word in the first auto-completion candidate.

8. A non-transitory computer-readable medium having instructions stored thereon, the instructions, when executed by one or more processors, cause the processors to perform operations comprising:

receiving a training corpus comprising a collection of complete input phrases that span a plurality of semantically distinct domains;
for each of a plurality of distinct words present in the collection of complete input phrases, calculating a respective word indexing power across the plurality of domains based on a respective normalized entropy for said word, wherein the respective normalized entropy is based on a total number of domains in which said word appears and how representative said word is for each of the plurality of domains;
for each complete input phrase in the collection of complete input phrases, calculating a respective phrase indexing power across the plurality of domains based on an aggregation of the respective word indexing powers of all constituent words of said complete input phrase;
obtaining respective domain-specific usage frequencies of the complete input phrases in the training corpus; and
generating a cross-domain ranking of the collection of complete input phrases based at least on the respective phrase indexing powers of the complete input phrases and the respective domain-specific usage frequencies of the complete input phrases.

9. The computer-readable medium of claim 8, wherein the operations further comprise:

providing the cross-domain ranking of the collection of complete input phrases to auser device, wherein the user device presents one or more auto-completion candidates in response to an initial user input in accordance with at least the cross-domain ranking of the collection of complete input phrases.

10. The computer-readable medium of claim 9, wherein calculating the respective word indexing power across the plurality of domains for each word wi of the plurality of distinct words further comprises: ɛ i = - 1 log   K  ∑ k = 1 K   c i, k t i  log  c i, k t i,

calculating the respective normalized entropy εi for the word wi based on a respective formula
 wherein K is a total number of domains in the plurality of domains, ci,k is a total number of times wi occurs in a domain dk of the plurality of domains, and ti=Σkci,k is a total number of times wi occurs in the collection of complete input phrases, and wherein the respective word indexing power of the word wi is (1−εi).

11. The computer-readable medium of claim 9, wherein calculating the respective phrase indexing power across the plurality of domains for each complete input phrase Pj of the collection of complete input phrases further comprises: μ j  = b T  n T  ( j ) + 1 n N  ( k ) + n T  ( j )  [ ∑ i = 1 n N  ( j )   ( 1 - ɛ i ) + ∑ i = 1 n T  ( j )   ( 1 - ɛ i ) ],

distinguishing template words from normal words in the complete input phrase Pj, a template word being a word that is used to represent a respective category of normal words in a particular complete input phrase and that is substituted by one or more normal words when provided as an input to the digital assistant by a user;
calculating the respective phrase indexing power μj for the complete input phrase Pj based on a respective formula
 wherein nN(j) is a total number of normal words present in the complete input phrase Pj, nT(j) is a total number of template words present in the complete input phrase Pj, (1−εi) is the respective word indexing power of each word wi, and bT is a respective template bias multiplier used to calculate the weight bias bTnT(j) for the input phrase Pj.

12. The computer-readable medium of claim 9, wherein generating the cross-domain ranking of the collection of complete input phrases further comprises: R j = ω v  v j + ω μ  μ j ω v + ω μ,

calculating a respective integrated ranking score Rj for each complete input phrase Pj of the collection of complete input phrases based on a respective formula
 wherein vj is a respective frequency of the complete input phrase Pj that has been normalized for cross-domain comparison, μj is a respective phrase indexing power of the complete input phrase Pj across the plurality of domains, wi and wi are relative weights given to domain-specific frequency and cross-domain indexing power in the ranking of the collection of complete input phrases; and
generating the cross-domain ranking of the collection of complete input phrases based on the respective integrated ranking scores for the collect ion of complete input phrases.

13. The computer-readable medium of claim 8, wherein the operations further comprise:

receiving the initial user input from a user;
identifying, from the collection of complete input phrases, a subset of complete input phrases that each begins with the initial user input;
ranking the subset of complete input phrases in accordance with the cross-domain ranking of the collection of complete input phrases;
selecting a predetermined number of unique relevant domains based on respective domains associated with each of the subset of complete input phrases; and
selecting at least one top-ranked input phrase from each of the unique relevant domains as one of the auto-completion candidates to be presented to the user.

14. The computer-readable medium of claim 8, wherein presenting one or more auto-completion candidates further comprising:

displaying a first portion of a first auto-completion candidate of the one or more auto-completion candidates, wherein the first portion of the first auto-completion candidate precedes or ends at a respective template word in the first auto-completion candidate;
receiving a subsequent user input specifying one or more normal words corresponding to the respective template word; and
in response to receiving the subsequent user input, displaying a second portion of the first auto-completion candidate succeeding the respective template word in the first auto-completion candidate.

15. The computer-readable medium of claim 14, wherein the operations further comprise:

displaying one or more suggestions corresponding to the respective template word based on a user-specific vocabulary comprising at least one of a plurality of proper nouns associated with the user, wherein the subsequent user input is a selection of the one or more displayed suggestions.

16. A system, comprising:

one or more processors; and
memory having instructions stored thereon, the instructions, when executed by the one or more processors, cause the processors to perform operations comprising:
receiving a training corpus comprising a collection of complete input phrases that span a plurality of semantically distinct domains;
for each of a plurality of distinct words present in the collection of complete input phrases, calculating a respective word indexing power across the plurality of domains based. on a respective normalized. entropy for said word, wherein the respective normalized entropy is based on a total number of domains in which said word appears and how representative said word is for each of the plurality of domains;
for each complete input phrase in the collection of complete input phrases, calculating a respective phrase indexing power across the plurality of domains based on an aggregation of the respective word indexing powers of all constituent words of said. complete input phrase;
obtaining respective domain-specific usage frequencies of the complete input phrases in the training corpus; and
generating a cross-domain ranking of the collection of complete input phrases based at least on the respective phrase indexing powers of the complete input phrases and the respective domain-specific usage frequencies of the complete input phrases.

17. The system of claim 16, wherein the operations further comprise:

providing the cross-domain ranking of the collection of complete input phrases to a user device, wherein the user device presents one or more auto-completion candidates in response to an initial user input in accordance with at least the cross-domain ranking of the collection of complete input phrases.

18. The system of claim 16, wherein calculating the respective word indexing power across the plurality of domains for each word wi of the plurality of distinct words further comprises: ɛ i = - 1 log   K  ∑ k = 1 K   c i, k t i  log  c i, k t i,

calculating the respective normalized entropy εi for the word wi based on a respective formula
 wherein K is a total number of domains in the plurality of domains, ci,k is a total number of times wi occurs in a domain dk of the plurality of domains, and ti=Σkci,k is a total number of times wi occurs in the collection of complete input phrases, and wherein the respective word indexing power of the word wi is (1−εi).

19. The system of claim 16, wherein calculating the respective phrase indexing power across the plurality of domains for each complete input phrase Pj of the collection of complete input phrases further comprises: μ j  = b T  n T  ( j ) + 1 n N  ( j ) + n T  ( j )  [ ∑ i = 1 n N  ( j )   ( 1 - ɛ i ) + ∑ i = 1 n T  ( j )   ( 1 - ɛ i ) ],

distinguishing template words from normal words in the complete input phrase Pj, a template word being a word that is used to represent a respective category of normal words in a particular complete input phrase and that is substituted by one or more normal words when provided as an input to the digital assistant by a user;
calculating the respective phrase indexing power μj for the complete input phrase Pj based on a respective formula
 wherein nN(j) is a total number of normal words present in the complete input phrase Pj, nT(j) is a total number of template words present in the complete input phrase Pj, (1−εi) is the respective word indexing power of each word wi, and bT is a respective template bias multiplier used to calculate the weight bias bTnT(j) for the input phrase Pj.

20. The system of claim 16, wherein generating the cross-domain ranking of the collection of complete input phrases further comprises: R j = ω v  v j + ω μ  μ j ω v + ω μ,

calculating a respective integrated ranking score Rj for each complete input phrase Pj of the collection of complete input phrases based on a respective formula
 wherein vj is a respective frequency of the complete input phrase Pj that has been normalized for cross-domain comparison, μj is a respective phrase indexing power of the complete input phrase Pj across the plurality of domains, ωv and ωμ are relative weights given to domain-specific frequency and cross-domain indexing power in the ranking of the collection of complete input phrases; and
generating the cross-domain ranking of the collection of complete input phrases based on the respective integrated ranking scores for the collection of complete input phrases.

21. The system of claim 20, wherein the operations further comprise:

for each complete input phrase Pj, normalizing the respective domain-specific frequency of said complete input phrase Pj by a maximum phrase count of a single input phrase observed in the training corpus.

22. The system of claim 21, wherein the operations further comprise:

updating the respective rank of each complete input phrase Pj based on a user-specific recency bias bR, wherein the user-specific recency bias is based on a number of times that a particular user has used the complete input phrase Pj.

23. The system of claim 16, wherein the operations further comprise:

receiving the initial user input from a user;
identifying, from the collection of complete input phrases, a subset of complete input phrases that each begins with the initial user input;
ranking the subset of complete input phrases in accordance with the cross-domain ranking of the collection of complete input phrases;
selecting a predetermined number of unique relevant domains based on respective domains associated with each of the subset of complete input phrases; and
selecting at least one top-ranked input phrase from each of the unique relevant domains as one of the auto-completion candidates to be presented to the user.

24. The system of claim 16, wherein presenting one or more auto-completion candidates further comprising:

displaying a first portion of a first auto-completion candidate of the one or more auto-completion candidates, wherein the first portion of the first auto-completion candidate precedes or ends at a respective template word in the first auto-completion candidate;
receiving a subsequent user input specifying one or more normal words corresponding to the respective template word; and
in response to receiving the subsequent user input, displaying a second portion of the first auto-completion candidate succeeding the respective template word in the first auto-completion candidate.

25. The system of claim 24, wherein the operations further comprise:

displaying one or more suggestions corresponding to the respective template word based on a user-specific vocabulary comprising at least one of a plurality of proper nouns associated with the user, wherein the subsequent user input is a selection of the one or more displayed suggestions.
Patent History
Publication number: 20140365880
Type: Application
Filed: Jun 6, 2014
Publication Date: Dec 11, 2014
Patent Grant number: 9582608
Inventor: Jerome R. BELLEGARDA (Saratoga, CA)
Application Number: 14/298,720
Classifications
Current U.S. Class: Input Of Abbreviated Word Form (715/261); Ranking, Scoring, And Weighting Records (707/748)
International Classification: G06F 17/30 (20060101); G06F 17/27 (20060101);