SYSTEM AND METHOD OF UTILIZING A HYBRID SEMANTIC MODEL FOR SPEECH RECOGNITION
A system includes a network interface, a speech input conversion component, and a routing module. Speech input is received in connection with a call. At least a segment of the speech input is transformed into a first textual format. A first list of entries is generated based, at least partially, on consideration of the first textual format. The first list includes at least one action with a corresponding confidence level and at least one object with another corresponding confidence level. An entry of the first list having a higher corresponding confidence level is selected, and a second textual format is output. A second list is generated based, at least partially, on consideration of the selected entry and the second textual format. A routing option is suggested based on the selected entry and a pairing entry in the second list.
Latest SBC Knowledge Ventures, L.P. Patents:
- System and Method of Presenting Caller Identification Information at a Voice Over Internet Protocol Communication Device
- SYSTEM AND METHOD OF ENHANCED CALLER-ID DISPLAY USING A PERSONAL ADDRESS BOOK
- System and Method of Processing a Satellite Signal
- System and Method of Automated Order Status Retrieval
- System and Method of Authorizing a Device in a Network System
This application is a continuation application of, and claims priority to, U.S. patent application Ser. No. 11/036,204, filed Jan. 14, 2005, the contents of which are expressly incorporated herein by reference in their entirety.
FIELD OF THE DISCLOSUREThe present disclosure relates generally to speech recognition and, more particularly, to a system and method of utilizing a hybrid semantic model for speech recognition.
BACKGROUNDMany speech recognition systems utilize specialized computers that are configured to process human speech and carry out some task based on the speech. Some of these systems support “natural language” type interactions between users and automated call routing (ACR) systems. Natural language call routing allows callers to state the purpose of the call “in their own words.”
A goal of a typical ACR application is to accurately determine why a customer is calling and to quickly route the customer to an appropriate agent or destination for servicing. Research has shown that callers prefer speech recognition systems to keypad entry or touchtone menu driven systems.
As suggested above, natural language ACR systems attempt to interpret the intent of the customer based on the spoken language. When a speech recognition system partially misinterprets the caller's intent significant problems can result. A caller who is misrouted is generally an unhappy customer. Misrouted callers often terminate the call or hang-up when they realize that there has been a mistake. If a caller does not hang up they will typically talk to an operator who tries to route the call. Routing a caller to an undesired location and then to a human operator leads to considerable inefficiencies for a business. Most call routing systems handle a huge volume of calls and, even if a small percentage of calls are mishandled, the costs associated with the mishandled calls can be significant.
The present disclosure is directed generally to integrating speech enabled automated call routing with action-object technology. Traditional automatic call routing systems assign a correct destination for a call 50% to 80% of the time. Particular embodiments of the disclosed system and method using action-object tables may achieve a correct destination assignment 85 to 95% of the time. In some embodiments, a semantic model may be used to create an action-object pair that further increases call routing accuracy while reducing costs. In particular implementations, the correct call destination routing rate may approach the theoretical limit of 100%. Due to higher effective call placement rates, the number of abandoned calls (e.g., caller hang-ups prior to completing their task) may be significantly reduced, thereby reducing operating costs and enhancing customer satisfaction.
In accordance with the teachings of the present disclosure, a call may be routed based on a selectable action-object pair. In practice, a call is received from a caller and a received speech input is converted into text or “text configurations,” which may be the same as, similar to, or can be associated with, known actions and objects. Generally, objects are related to nouns and actions are related to verbs. The converted text may be compared to tables of known text configurations representing objects and actions. A confidence level may be assigned to the recognized actions and objects based on text similarities and other rules. An action-object list may be created that contains recognized actions and objects and their confidence levels. In some embodiments, the entry (action or object) in the list with the highest confidence level may be selected as a dominant item. If an action is dominant a system incorporating teachings disclosed herein may look for a complementary object. Likewise, if an object is dominant, the system may look for a complementary action.
In some implementations, when an action is dominant, remaining actions may be masked and the confidence level of the complementary objects in the action-object list may be adjusted. Conversely, if an object is dominant, the remaining objects may be masked and the confidence level of complementary actions in the action-object list may be adjusted. An adjustment to an assigned confidence level may be based, for example, on the likelihood that the prospective complement in the action-object list is consistent with the dominant entry. Depending upon implementation details, a call may be routed based on a dominant action and a complementary object or a dominant object and a complementary action.
Referring now to
An illustrative embodiment of SECRS 118 may be a call center having a plurality of agent terminals attached. Thus, while only a single operator 130 is shown in
In a particular embodiment, action-object routing module 140 includes an action-object lookup table for matching action-object pairs to desired call routing destinations. This process may be better understood through consideration of
When a speech input conversion creates a dominant action (e.g., an action has the highest confidence level in the action-object list), a system like SECRS 118 of
In practice, the secondary conversion or a second list can be generated that may take the initial speech received from the caller and processes the initial speech a second time. During the second conversion the semantic model 220 may look specifically for consistent objects while ignoring actions if an action had the highest overall confidence level. In such a case, the high scoring action may have been selected, the actions may have been masked, and objects that are inconsistent with the selected action may be tagged as invalid. Examples of invalid action-object combinations can be understood by referring to
If the speech input conversion creates a dominant object, a secondary conversion may be initiated to create an action list to assist in selecting a complementary action. The secondary conversion may take the initial speech received from the caller and processes the initial speech a second time. It may also rely on an output from the processing performed in connection with the earlier conversion. During the second conversion, semantic model 220 may look specifically for actions while ignoring objects. The confidence levels of actions may also be adjusted based on actions that are inconsistent with the selected object. Thus, in either case a call may be routed based on a dominant entry and a valid complement to the dominant entry.
The results of a reiterative speech recognition process may be provided to action-object routing table 230. Routing table 230 may receive action-object pairs 206 and produce a call routing destination 208. Based on the call routing destination 208, a call received at a call routing network like SECRS 118 may be routed to a final destination, such as the billing department 120 or the technical support service destination 124 depicted in
Referring to
In some cases, many possible actions and objects may be detected or created from the word strings. A method incorporating teachings of the present disclosure may attempt to determine and select a most probable action and object from a list of preferred objects and actions. To aid in this resolution, a synonym table such as the synonym table of
The confidence level may be assigned to an action and/or an object based on many criteria, such as the textual similarities, business rules, etc., in step 310. Confidence levels may also be assigned based on a combination of factors, and some of these factors may not involved speech recognition. For example, in a particular example, if a caller does not currently have service, a caller's number (caller ID) may be utilized to assign a high confidence level to the action “acquire” and a low confidence value the actions “change” or “cancel.” In the event that a confidence level for an action-object pair is below a predetermined level, the call may be routed to a human operator or agent terminal.
An action-object list may be utilized at step 312 to select a dominant entry. If an action is selected as the dominant entry at step 334, other actions in the action-object list may be masked and objects that are inconsistent with the selected action may be tagged as invalid at step 336. The process of invalidating objects based on a dominant action can be further explained by referring to
Based on a dominant action, the confidence level of the objects can be adjusted at step 338. The caller's input of the utterance may be sent through the acoustic model, again in step 340, and the acoustic model may create and store word strings, as shown in step 342. Word strings may be parsed into objects using the semantic model in step 344, and an object list may be formed where each object in the list is assigned a confidence level in step 346. When a list is sufficiently complete, the object having the highest confidence level may be selected to complement the dominant action and an action-object pair may be created at step 330.
If at step 312 it is determined that an object has the highest confidence level or is dominant then a search for a complementary action may be conducted. Objects remaining in the action-object list and action that are inconsistent with the selected object may be masked or tagged as invalid, as shown in step 316. Thus such a method may ignore objects and invalid actions in the search for a complementary action when a dominant object has been elected.
Based on the dominant object, the confidence level of listed actions may be adjusted at step 318. The original caller input may be sent through the acoustic model, again in step 320 and the acoustic model may create and store word strings as in step 322. Words strings may then be parsed into objects using the semantic model in step 324 and an actions list may be formed where actions in the list is assigned a confidence level at step 326. The action having the highest confidence level (at step 328) may be selected to complement the dominant object and an action-object pair may be passed at step 330. The call may then be routed at step 331, the process ending at 332.
In practice, it may be beneficial to convert word strings such as “I want to have” to an action such as “get.” This substantially reduces the size of the action and object tables. As shown in
The above-disclosed subject matter is to be considered illustrative, and not restrictive, and the appended claims are intended to cover all such modifications, enhancements, and other embodiments that fall within the true spirit and scope of the present invention. Thus, to the maximum extent allowed by law, the scope of the present invention is to be determined by the broadest permissible interpretation of the following claims and their equivalents, and shall not be restricted or limited by the foregoing detailed description.
Claims
1. A system, comprising:
- a network interface configured to receive a speech input in connection with a call;
- a speech input conversion component configured to: transform at least a segment of the speech input into a first textual format; generate a first list of entries based, at least partially, on consideration of the first textual format, the first list comprising at least one action having a corresponding confidence level and at least one object having another corresponding confidence level; select an entry of the first list having a higher corresponding confidence level; output a second textual format; generate a second list based, at least partially on consideration of the selected entry and the second textual format; and
- a routing module configured to suggest a routing option for the call based on the selected entry and an associated pairing entry in the second list.
2. The system of claim 1, wherein the speech input conversion component is further configured to re-process the speech input to create an object list when an action is the selected entry.
3. The system of claim 1, wherein the speech input conversion component is further configured to re-process the speech input to create an action list when an object is the selected entry.
4. The system of claim 2, wherein the speech input conversion component is further configured to:
- include an associated confidence level with an object entry in the object list; and
- select an object as the pairing entry based on the associated confidence level.
5. The system of claim 3, wherein the speech input conversion component is further configured to:
- re-process the speech input to produce the action list with confidence levels; and
- select an action based on the confidence levels in the action list.
6. The system of claim 1, wherein the speech input conversion component is further configured to compare the first textual format to a list of word strings and to assign a probability to at least one word string included in the list of word strings.
7. The system of claim 6, wherein the speech input conversion component is further configured to assign an appropriate confidence level to the at least one word string.
8. The system of claim 1, wherein the entry selected is one of a verb and an adverb-verb combination.
9. The system of claim 1, wherein the entry selected is one of a noun or an adjective-noun combination.
10. The system of claim 1, wherein the speech input conversion component is further configured to utilize a synonym table to assist in converting the speech input into action and objects.
11. A method, comprising:
- receiving a speech input in connection with a call;
- processing the speech input to generate a first action list and an object list;
- assigning a first confidence level to each action of the first action list and to each object of the object list;
- selecting a particular object with a high confidence level from the object list; and
- removing at least one action from the first action list, wherein the at least one action is inconsistent with the particular object.
12. The method of claim 11, further comprising assigning a second confidence level to each remaining action of the first action list based on the particular object.
13. The method of claim 12, further comprising:
- re-processing the speech input to generate a second action list;
- assigning a second confidence level to each action of the second action list;
- selecting a particular action with a high confidence level from the second action list; and
- suggesting a routing option for the call based on the particular action and the particular object.
14. The method of claim 13, further comprising routing the call to a destination.
15. The method of claim 13, wherein the first confidence level and the second confidence level are assigned based on a predetermined likelihood of reflecting an intent of a caller.
16. A method, comprising:
- receiving a speech input in connection with a call;
- processing the speech input to generate a first object list and an action list;
- assigning a first confidence level to each object of the first object list and to each action of the action list;
- selecting a particular action with a high confidence level from the action list; and
- removing at least one object from the first object list, wherein the at least one object is inconsistent with the particular action.
17. The method of claim 16, further comprising assigning a second confidence level to each remaining object of the first object list based on the particular action.
18. The method of claim 17, further comprising:
- re-processing the speech input to generate a second object list;
- assigning a second confidence level to each object of the second object list; and
- selecting a particular object with a high confidence level from the second object list;
- suggesting a routing option for the call based on the particular action and the particular object.
19. The method of claim 18, further comprising routing the call to a destination.
20. The method of claim 16, wherein the first confidence level and the second confidence level are assigned based on a predetermined likelihood of reflecting an intent of a caller.
Type: Application
Filed: Nov 11, 2008
Publication Date: Mar 12, 2009
Applicant: SBC Knowledge Ventures, L.P. (Reno, NV)
Inventors: Robert R. Bushey (Cedar Park, TX), Benjamin Anthony Knott (Round Rock, TX), John Mills Martin (Austin, TX)
Application Number: 12/268,894
International Classification: H04M 11/00 (20060101); H04M 1/64 (20060101); G10L 11/00 (20060101);