System and method for independently recognizing and selecting actions and objects in a speech recognition system
A method for processing a call is disclosed. The method receives a speech input via a call and transforms at the speech input into a textual format. The method also creates a list of salient terms of actions and objects from the text, adjusts the confidence level of objects on the list if the dominant term is an action and selects a complimentary object from the list to combine with the action to form an action-object pair. The method further adjusts a confidence level of actions on the list if the dominant term is an object and selects a complementary action from the list to combine with the action to form the action-object pair, and routes the call based on the action-object pair
The present disclosure relates generally to speech recognition and, more particularly, to a system and method for independently recognizing and selecting actions and objects.
BACKGROUNDMany speech recognition systems utilize specialized computers that are configured to process human speech and carry out some task based on the speech. Some of these systems support “natural language” type interactions between users and automated call routing (ACR) systems. Natural language call routing allows callers to state the purpose of the call “in their own words.”
A goal of a typical ACR application is to accurately determine why a customer is calling and to quickly route the customer to an appropriate agent or destination for servicing. Research has shown that callers prefer speech recognition systems to keypad entry or touchtone menu driven systems.
As suggested above, natural language ACR systems attempt to interpret the intent of the customer based on the spoken language. When a speech recognition system partially misinterprets the caller's intent significant problems can result. A caller who is misrouted is generally an unhappy customer. Misrouted callers often terminate the call or hang-up when they realize that there has been a mistake. If a caller does not hang up they will typically talk to an operator who tries to route the call. Routing a caller to an undesired location and then to a human operator leads to considerable inefficiencies for a business. Most call routing systems handle a huge volume of calls and, even if a small percentage of calls are mishandled, the costs associated with the mishandled calls can be significant.
BRIEF DESCRIPTION OF THE DRAWINGS
The present disclosure is directed generally to integrating speech enabled automated call routing with action-object technology. Traditional automatic call routing systems assign a correct destination for a call 50% to 80% of the time. Particular embodiments of the disclosed system and method using action-object tables may achieve a correct destination assignment 85 to 95% of the time. In some embodiments, a semantic model may be used to create an action-object pair that further increases call routing accuracy while reducing costs. In particular implementations, the correct call destination routing rate may approach the theoretical limit of 100%. Due to higher effective call placement rates, the number of abandoned calls (e.g., caller hang-ups prior to completing their task) may be significantly reduced, thereby reducing operating costs and enhancing customer satisfaction.
In accordance with the teachings of the present disclosure, a call may be routed based on a selectable action-object pair. In practice, a call is received from a caller and a received speech input is converted into text or “text configurations,” which may be the same as, similar to, or can be associated with, known actions and objects. Generally, objects are related to nouns and actions are related to verbs. The converted text may be compared to tables of known text configurations representing objects and actions. A confidence level may be assigned to the recognized actions and objects based on text similarities and other rules. An action-object list may be created that contains recognized actions and objects and their confidence levels. In some embodiments, the entry (action or object) in the list with the highest confidence level may be selected as a dominant item. If an action is dominant a system incorporating teachings disclosed herein may look for a complementary object. Likewise, if an object is dominant, the system may look for a complementary action.
In some implementations, when an action is dominant, remaining actions may be masked and the confidence level of the complementary objects in the action-object list may be adjusted. Conversely, if an object is dominant, the remaining objects may be masked and the confidence level of complementary actions in the action-object list may be adjusted. An adjustment to an assigned confidence level may be based, for example, on the likelihood that the prospective complement in the action-object list is consistent with the dominant entry. Depending upon implementation details, a call may be routed based on a dominant action and a complementary object or a dominant object and a complementary action.
Referring now to
An illustrative embodiment of SECRS 118 may be a call center having a plurality of agent terminals attached. Thus, while only a single operator 130 is shown in
In a particular embodiment, action-object routing module 140 includes an action-object lookup table for matching action-object pairs to desired call routing destinations. This process may be better understood through consideration of
When a speech input conversion creates a dominant action (e.g., an action has the highest confidence level in the action-object list), a system like SECRS 118 of
The high scoring action may have been selected, the actions may have been masked, and objects that are inconsistent with the selected action may be tagged as invalid. Examples of invalid action-object combinations can be understood by referring to
Based on the call routing destination 208, a call received at a call routing network like SECRS 118 may be routed to a final destination, such as the billing department 120 or the technical support service destination 124 depicted in
Referring to
In some cases, many possible actions and objects may be detected or created from the word strings. A method incorporating teachings of the present disclosure may attempt to determine and select a most probable action and object from a list of preferred objects and actions. To aid in this resolution, a synonym table such as the synonym table of
In summary at step 310 multiple actions and multiple objects can be identified from the list of salient terms and assigned a confidence level according to the likelihood that a particular action or object identifies a customer's intent and thus will lead to a successful routing of the call. The confidence level can be assigned to an action or an object based on many criteria such as text similarity, business rules etc. in step 310. In a particular example, a callers' number (caller ID) may be utilized to assign a high confidence value to the action “acquire,” and a low confidence value the actions “change,” or “cancel,” if the caller does not currently have service. In the event that a confidence level for an action-object pair is below a predetermined level, the call may be routed to a human operator or agent terminal.
In decision step 312 the action or object with the highest confidence level is selected and marked as the dominant term. After a dominant term is selected, the method proceeds to find a complement for the dominant term. For example, if the dominant term is an object the complement will be an action and visa-versa. If an action is dominant all other actions in the action-object list can be invalidated, tagged or masked and objects that are inconsistent with the dominant action can also be tagged as invalid as in step 320. The process of invalidating objects based on a dominant action can be further explained by referring to
When it is determined that an object is dominant (i.e. has the highest confidence level in the object-action table) at step 312 a search for a complementary action is conducted. Objects remaining in the action-object list and actions that are inconsistent with the dominant object are masked or tagged as invalid as in step 314. The search for a complementary action can ignore objects and invalid actions. The method again refers to the object-action list to select a complementary action having the highest confidence level to complement the dominant object in step 318. An object-action pair is created at step 326 and the call is routed at step 328 and the process ends at 330.
In practice, it may be beneficial to convert word strings such as “I want to have” to actions such as “get.” This substantially reduces the size of the action and object tables. As shown in
The above-disclosed subject matter is to be considered illustrative, and not restrictive, and the appended claims are intended to cover all such modifications, enhancements, and other embodiments that fall within the true spirit and scope of the present invention. Thus, to the maximum extent allowed by law, the scope of the present invention is to be determined by the broadest permissible interpretation of the following claims and their equivalents, and shall not be restricted or limited by the foregoing detailed description.
Claims
1. A method for processing a call comprising:
- receiving a speech input via a call;
- transforming the speech input into text;
- creating a list of salient terms comprising actions and objects from the text, wherein the list has a dominant term;
- adjusting a confidence level of objects on the list if the dominant term is an action and selecting a complimentary object from the list to combine with the action to form an action-object pair
- adjusting a confidence level of actions on the list if the dominant term is an object and
- selecting a complementary action from the list to combine with the object to form the action-object pair; and
- routing the call based on the action-object pair.
2. The method of claim 1, further comprising masking actions and invalid objects if the dominant term is an action.
3. The method of claim 1, further comprising masking objects and invalid actions if the dominant term is an object.
4. The method of claim 1, further comprising parsing the text into an action list and an object list.
5. The method of claim 1, further comprising assigning an initial confidence level to each of the actions and the objects contained in the list of salient terms.
6. The method of claim 1, further comprising adjusting the confidence level of objects to a value that represents a probability that a given object represents an intent of the caller.
7. The method of claim 1, further comprising adjusting the confidence level of actions to a value that represents a probability that a given action represents an intent of the caller.
8. The method of claim 1, wherein the action is selected from a group consisting of a verb and an adverb-verb combination.
9. The method of claim 1, wherein the object is selected from a group consisting of a noun and an adjective-noun combination.
10. A system for routing calls comprising:
- a call routing system having an interface, the call routing system configured to receive a call; to transform a speech signal received via the call into text; and +P2
- to create a list of salient terms containing actions and objects from the text, wherein the list contains dominant terms;
- a synonym module associated with the call routing system and operable to adjust a confidence level of invalid objects in the list when there is a dominant action and to adjust a confidence level of invalid actions when there is a dominant object;
- a pairing module associated with the call routing system and operable to select a complementary action from the list when there is a dominant object and a complementary object when there is a dominant action, to create an action-object pair; and
- a switch to route the call based on the action-object pair.
11. The system of claim 10, wherein the call routing system uses phonemes to convert the speech input to a word string.
12. The system of claim 11, wherein the call routing system assigns a confidence value to the word string.
13. The system of claim 11, wherein the call routing system parses the word string into a respective action and a respective object.
14. The system of claim 11, wherein the call routing system assigns a confidence value to the word string and the confidence value represents a probability that the word string represents an intent of the caller.
15. The system of claim 10, wherein the call routing system assigns a confidence value to the object and the confidence value represents a probability that the object represents an intent of the caller.
16. The system of claim 10, wherein the call routing system assigns a confidence value to the action and the confidence value represents a probability that the action represents an intent of the caller.
17. The system of claim 10, wherein the action is one of a verb or an adverb-verb combination.
18. The system of claim 10, wherein the object is one of a noun or an adjective-noun combination.
19. The system of claim 10, wherein the call routing system is operable to route the call to a destination.
20. A system comprising:
- an acoustic engine configured to accept a speech input and to produce a textual version of at least a portion of the speech input as its output;
- a semantic engine coupled to the acoustic engine and operable to identify an action and an object indicated by the textual version;
- a probability system operable to assign confidence levels to the action and the object; and
- an action-object routing table operable to provide a routing destination based at least partially on the confidence levels assigned to the action and the object.
21. The system of claim 20 further comprising a plurality of different speech inputs.
22. The system of claim 20 wherein the action is selected from a list of expected utterances comprising acquire, cancel, change, inquire, inform, and how to use.
23. The system of claim 20 wherein the object is selected from a list of expected utterances comprising DSL, basic service, call notes, caller ID, bill payment, other providers, coupon specials, names and number, and store locations
24. The system of claim 20 further comprising memory storing a library that comprises a plurality of expected actions.
25. The system of claim 20 further comprising a tuning module operable to accept an input intended to improve a recognition rate of the acoustic engine.
Type: Application
Filed: Jan 14, 2005
Publication Date: Jul 20, 2006
Patent Grant number: 7627096
Inventors: Robert Bushey (Cedar Park, TX), Michael Sabourin (Saint Lambert), Carl Potvin (La Prairie), Benjamin Knott (Round Rock, TX), John Martin (Austin, TX)
Application Number: 11/036,201
International Classification: G10L 15/26 (20060101);