SYSTEM AND METHOD FOR ELICITING OPEN-ENDED NATURAL LANGUAGE RESPONSES TO QUESTIONS TO TRAIN NATURAL LANGUAGE PROCESSORS
Systems and methods gathering text commands in response to a command context using a first crowdsourced are discussed herein. A command context for a natural language processing system may be identified, where the command context is associated with a command context condition to provide commands to the natural language processing system. One or more command creators associated with one or more command creation devices may be selected. A first application one the one or more command creation devices may be configured to display command creation instructions for each of the one or more command creators to provide text commands that satisfy the command context, and to display a field for capturing a user-generated text entry to satisfy the command creation condition in accordance with the command creation instructions. Systems and methods for reviewing the text commands using second and crowdsourced jobs are also presented herein.
Latest VoiceBox Technologies Corporation Patents:
- System and method of providing intent predictions for an utterance prior to a system detection of an end of the utterance
- System and method for generating a multi-lingual and multi-intent capable semantic parser based on automatically generated operators and user-designated utterances relating to the operators
- System and method for validating natural language content using crowdsourced validation jobs
- Multi-lingual semantic parser based on transferred learning
- System and method of determining a domain and/or an action related to a natural language input
This application is a continuation of U.S. patent application Ser. No. 15/257,217 filed Sep. 6, 2016 entitled “SYSTEM AND METHOD FOR ELICITING OPEN-ENDED NATURAL LANGUAGE RESPONSES TO QUESTIONS TO TRAIN NATURAL LANGUAGE PROCESSORS”; which claims priority to U.S. Provisional Patent Application No. 62/215,115, entitled “SYSTEM AND METHOD FOR ELICITING OPEN-ENDED NATURAL LANGUAGE RESPONSES TO QUESTIONS TO TRAIN NATURAL LANGUAGE PROCESSORS,” filed on Sep. 7, 2015, the entireties of are incorporated herein by reference. This application is related to co-pending PCT Application No. PCT/US2016/50389, entitled “SYSTEM AND METHOD FOR ELICITING OPEN-ENDED NATURAL LANGUAGE RESPONSES TO QUESTIONS TO TRAIN NATURAL LANGUAGE PROCESSORS,” Attorney Docket No. 45HV-246284, filed Sep. 6, 2016, which is incorporated by reference herein in its entirety.
FIELD OF THE INVENTIONThe field of the invention generally relates to gathering and reviewing text commands related to the command contexts unmanaged crowds to train natural language processing systems to recognize text commands for one or more command contexts.
BACKGROUND OF THE INVENTIONBy translating voice data into text, speech recognition has played an important part in many Natural Language Processing (NLP) technologies. For instance, speech recognition has proven useful to technologies involving vehicles (e.g., in-car speech recognition systems), technologies involving health care, technologies involving the military and/or law enforcement, technologies involving telephony, and technologies that assist people with disabilities. Speech recognition systems are often trained and deployed to end-users.
The end-user deployment phase typically includes using a trained acoustic model to identify text in voice data provided by end-users. The training phase typically involves training an acoustic model in the speech recognition system to recognize text in voice data. The training phase often includes capturing voice data, transcribing the voice data into text, and storing pairs of voice data and text in transcription libraries. Capturing voice data in the training phase typically involves collecting different syllables, words, and/or phrases commonly used in speech. Depending on the context, these utterances may form the basis of commands to a computer system, requests to gather information, portions of dictations, or other end-user actions.
Conventionally, NLP systems captured voice data using teams of trained Natural Language Processing (NLP) trainers who were housed in a recording studio or other facility having audio recording equipment therein. The voice data capture process often involved providing the NLP trainers with a list of utterances, and recording the utterances using the audio recording equipment. Teams of trained transcribers in dedicated transcription facilities typically listened to the utterances, and manually transcribed the utterances into text.
Though useful, conventional NLP systems have problems accommodating the wide variety of utterances present in a given language. More specifically, different NLP end-users may provide different commands to perform similar tasks. As an example, it is often difficult to train an NLP system to recognize the different pronunciations, syntaxes, word orders, etc. that that different NLP end-users use when requesting an NLP system to perform a specific task. It would be desirable to provide systems and methods that accurately and cost-effectively collect and store the utterances NLP end-users are likely to use for a command context.
SUMMARY OF THE INVENTIONSystems and methods for generating entity level annotated text utterances using unmanaged crowds are described herein. Utterances are then used to build state-of-the-art Named Entity Recognition (NER) models, used for natural language processing systems (e.g., dialogue systems). In some implementations, a wide variety of raw utterances are collected through a variant elicitation task. The relevance of these utterances may be verified by feeding these utterances back to a second crowd for a domain validation task. Utterances with potential spelling errors may be flagged; these errors may be verified with review by the second crowd before discarding these errors. These techniques, combined with a periodic non-machine readable tasks (e.g., CAPTCHA captions and/or non-machine readable sound bites) may assist in preventing automated responses, and allow for the collection of high quality text utterances despite the inability to use the traditional gold test question approach for spam filtering. The utterances may be tagged with appropriate NER labels using unmanaged crowds.
Systems and methods for collecting text variants may define scenarios that are to be represented within a collection. Scenarios may be based upon what a given product is designed to support and are intended to provoke responses from workers in the crowd that simulate what a real user would say when using a particular voice recognition product.
Once the scenarios are defined, the system may present scenarios to users (e.g., users in an unmanaged crowd). Such scenarios may be presented through a crowdsourcing platform. The scenarios give the users context pertaining to the kind of product they are (hypothetically) using and the worker is asked to write a text command (variant) as if they were asking a voice recognition enabled personal assistant to perform a certain specified function. For example, the system may be used to collect variations of commands that might be used to turn off a device. In this example, the system may generate a job that asks users in the crowd: “Imagine you have just finished using the device. Think of two ways you might speak to the device to turn it off.”
The system may score a given response based on the number of users in the crowd that provided the given response. For instance, 600 users may respond with the command “shut down.” 200 users may respond with the command “turn off.” Lesser numbers of users may provide different responses. The system may determine that “shut down” and “turn off” are common responses (and therefore score them in a particular manner). In some instances, if a particular response (e.g., “shut down”) dominates over others is predominantly (e.g., based on a percentage of the particular response exceeding a predefined threshold), then the system may no longer allow other users to provide the same response. This may be to elicit a variety of different responses, other than the dominant response (or dominant responses).
At any rate, once the variants have been collected, the system may perform a number of spellchecking stages. For example, the system may run the collected variants through an automated spellchecker. Variants are flagged during this process but not thrown out. The system may deploy another crowdsourcing job in which users are asked to identify whether an utterance really contains a spelling error for the variants that were flagged by the automatic spellchecker. If a variant is found to contain an error, it may be discarded.
After variants have been processed through the spellchecking stage, the system may perform a domain-validation those variants. During domain-validation, the system may employ a crowdsourcing jobs in which the variants are displayed alongside several golden variants, which were a representative sample of relevant responses to the original scenarios that were presented to users during the previous elicitation phase. Users in the crowd are asked to decide whether the variant being evaluated belongs with the group of golden variants. This strategy allows untrained users in the crowd to accurately match responses to domains without having to explicitly describe the constraints of that domain to the user. It is easier for an untrained worker to decide whether a variant is similar to a collection of domain appropriate variants than it is for a worker to explicitly learn and apply the constraints of that domain.
Systems and methods gathering text commands in response to a command context using a first crowdsourced are discussed herein. A command context for a natural language processing system may be identified, where the command context is associated with a command context condition to provide commands to the natural language processing system. One or more command creators associated with one or more command creation devices may be selected. A first application one the one or more command creation devices may be configured to display command creation instructions for each of the one or more command creators to provide text commands that satisfy the command context, and to display a field for capturing a user-generated text entry to satisfy the command creation condition in accordance with the command creation instructions. Systems and methods for reviewing the text commands using second and crowdsourced jobs are also presented herein.
In an implementation, the first application is configured to display a plurality of command contexts, and to capture, for each of the plurality of command contexts, a plurality of user-generated text entries.
Configuring the first application may comprise periodically displaying a non-machine readable caption on the first application. The non-machine readable caption may comprise one or more of a non-machine readable image and a non-machine readable sound bite.
In some implementations, one or more command reviewers associated with one or more command reviewer devices may be selected; and a second application on the one or more command reviewer devices may be configured with command review instructions for the one or more command reviewers to review the user-generated text entry. The command review instructions may instruct the one or more command reviewers to review the syntax of the user-generated text entry. The command review instructions may instruct the one or more command reviewers to compare the user-generated text entry with model text language variants for the command context. The second application may be configured to display the user-generated text entry and the command context alongside the model text language variants, and to receive a selection from the one or more command reviewers whether the user-generated text entry is as similar to command context condition as one of the model text language variants.
In various implementations, syntax of the user-generated text entry may be checked. Moreover, the user-generated text entry may be stored in a transcription library. The user-generated text entry may be used for transcription of voice data during an end-user deployment phase of the natural language processing system.
These and other objects, features, and characteristics of the system and/or method disclosed herein, as well as the methods of operation and functions of the related objects of structure and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the invention. As used in the specification and in the claims, the singular form of “a”, “an”, and “the” include plural referents unless the context clearly dictates otherwise.
Example of a System Architecture
The Structures of the Natural Language Processing Environment 100
The NLP end-user device(s) 102 may include one or more digital devices configured to provide an end-user with natural language transcription services. A “natural language transcription service,” as used herein, may include a service that converts audio contents into a textual format. A natural language transcription service may recognize words in audio contents, and may provide an end-user with a textual representation of those words. The natural language transcription service may be incorporated into an application, process, or run-time element that is executed on the NLP end-user device(s) 102. In an implementation, the natural language transcription service is incorporated into a mobile application that executes on the NLP end-user device(s) 102 or a process maintained by the operating system of the NLP end-user device(s) 102. In various implementations, the natural language transcription service may be incorporated into applications, processes, run-time elements, etc. related to technologies involving vehicles, technologies involving health care, technologies involving the military and/or law enforcement, technologies involving telephony, technologies that assist people with disabilities, etc. The natural language transcription service may be supported by the NLP command creation device(s) 104, the NLP command reviewer device(s) 108, and the NLP server 110, as discussed further herein.
The NLP end-user device(s) 102 may include components, such as memory and one or more processors, of a computer system. The memory may further include a physical memory device that stores instructions, such as the instructions referenced herein. The NLP end-user device(s) 102 may include one or more audio input components (e.g., one or more microphones), one or more display components (e.g., one or more screens), one or more audio output components (e.g., one or more speakers), etc. In some implementations, the audio input components of the NLP end-user device(s) 102 may receive audio content from an end-user, the display components of the NLP end-user device(s) 102 may display text corresponding to transcriptions of the audio contents, and the audio output components of the NLP end-user device(s) 102 may play audio contents to the end-user. It is noted that in various implementations, however, the NLP end-user device(s) 102 need not display transcribed audio contents, and may use transcribed audio contents in other ways, such as to provide commands that are not displayed on the NLP end-user device(s) 102, use application functionalities that are not displayed on the NLP end-user device(s) 102, etc. The NLP end-user device(s) 102 may include one or more of a networked phone, a tablet computing device, a laptop computer, a desktop computer, a server, or some combination thereof.
The NLP command creation device(s) 104 may include one or more digital device configured to provide NLP command creators with command contexts to and receive text commands in response to those command contexts. An “NLP command creator,” as used herein, may refer to a person who creates text commands for command contexts using a mobile application executing on the NLP command creation device(s) 104. In various implementations, NLP command creators need not be trained (e.g., may be part of an unmanaged crowd) in providing text commands in response to command contexts. NLP command creators may be compensated (e.g., with incentives, points, virtual currencies, real currencies, etc.) for completing a specified number of jobs.
A “command context,” as used herein, may refer to a scenario (e.g., a set of conditions) that an NLP end-user uses the a natural language transcription service for. Examples of command contexts include contexts related to controlling the NLP end-user device(s) 104 (e.g., controlling an application, process, device, etc. using voice data), contexts related to document generation and/or document editing (e.g., dictation tasks, etc.), and contexts related to searching documents and/or network resources (e.g., using voice data to perform Internet searches, etc.). A “text command,” as used herein, may refer to utterances (syllables, words, phrases, combinations of phrases, etc.) used by an NLP end-user within a specific command context. As an example, a text command related to a command context of Internet searches may include the word “search” (or variants thereof) followed by a sequence of words that provide parameters for the search. As another example, a text command related to controlling a vehicle may include the word “go” (or variants thereof) followed by a sequence of words that provide parameters for directions, speeds, etc.
The NLP command creation device (s) 104 may include components, such as memory and one or more processors, of a computer system. The memory may further include a physical memory device that stores instructions, such as the instructions referenced herein. The NLP command creation device (s) 104 may include one or more audio input components (e.g., one or more microphones), one or more display components (e.g., one or more screens), one or more audio output components (e.g., one or more speakers), etc. The NLP command creation device (s) 104 may support a mobile application, process, etc. that is used to capture text commands during the training phase of the natural language processing environment 100. The mobile application on the NLP command creation device(s) 104 may display on a screen a command context and a request to provide text commands related to that command context. The NLP command creation device (s) 104 may include one or more of a networked phone, a tablet computing device, a laptop computer, a desktop computer, a server, or some combination thereof.
The NLP command reviewer device(s) 106 may include one or more digital devices configured to provide NLP command reviewers with the ability to review whether text commands provided by the NLP command creation device(s) 104 are similar to model text commands for a command context. In some implementations, the NLP command reviewer device(s) 106 provide NLP command reviewers with the ability to verify the accuracy of text commands provided by the NLP command creation device(s) 104 by checking syntax (spelling, grammar, etc.) of the text commands provided by the NLP command creation device(s) 104. In various implementations, NLP command reviewers need not be trained (e.g., may be part of an unmanaged crowd) in reviewing text commands, but rather may comprise ordinary users of the NLP command reviewer device(s) 106. NLP command reviewers may be compensated (e.g., with incentives, points, virtual currencies, real currencies, etc.) for completing a specified number of jobs.
The NLP command reviewer device(s) 106 may include components, such as memory and one or more processors, of a computer system. The memory may further include a physical memory device that stores instructions, such as the instructions referenced herein. The NLP command reviewer device(s) 106 may include one or more audio input components (e.g., one or more microphones), one or more display components (e.g., one or more screens), one or more audio output components (e.g., one or more speakers), etc. The NLP command reviewer device(s) 106 may support a mobile application, process, etc. that is used to review text commands during the training phase of the natural language processing environment 100. The NLP command creation device (s) 104 may include one or more of a networked phone, a tablet computing device, a laptop computer, a desktop computer, a server, or some combination thereof.
The network 108 may comprise any computer network. The network 108 may include a networked system that includes several computer systems coupled together, such as the Internet. The term “Internet” as used herein refers to a network of networks that uses certain protocols, such as the TCP/IP protocol, and possibly other protocols such as the hypertext transfer protocol (HTTP) for hypertext markup language (HTML) documents that make up the World Wide Web (the web). Content is often provided by content servers, which are referred to as being “on” the Internet. A web server, which is one type of content server, is typically at least one computer system which operates as a server computer system and is configured to operate with the protocols of the web and is coupled to the Internet. The physical connections of the Internet and the protocols and communication procedures of the Internet and the web are well known to those of skill in the relevant art. In various implementations, the network 108 may be implemented as a computer-readable medium, such as a bus, that couples components of a single computer together. For illustrative purposes, it is assumed the network 108 broadly includes, as understood from relevant context, anything from a minimalist coupling of the components illustrated in the example of
In various implementations, the network 108 may include technologies such as Ethernet, 802.11, worldwide interoperability for microwave access (WiMAX), 3G, 4G, CDMA, GSM, LTE, digital subscriber line (DSL), etc. The network 108 may further include networking protocols such as multiprotocol label switching (MPLS), transmission control protocol/Internet protocol (TCP/IP), User Datagram Protocol (UDP), hypertext transport protocol (HTTP), simple mail transfer protocol (SMTP), file transfer protocol (FTP), and the like. The data exchanged over the network 108 can be represented using technologies and/or formats including hypertext markup language (HTML) and extensible markup language (XML). In addition, all or some links can be encrypted using conventional encryption technologies such as secure sockets layer (SSL), transport layer security (TLS), and Internet Protocol security (IPsec). In some implementations, the network 108 comprises secure portions. The secure portions of the network 108 may correspond to a networked resources managed by an enterprise, networked resources that reside behind a specific gateway/router/switch, networked resources associated with a specific Internet domain name, and/or networked resources managed by a common Information Technology (“IT”) unit.
The NLP server 110 may include one or more digital devices configured to support natural language transcription services. The NLP server 110 may include a an NLP command management engine 112 and an end-user deployment engine 114.
The NLP command management engine 112 may manage collection from the NLP command creation device(s) 104 of text commands for one or more command contexts. The NLP command management engine 112 may also manage review by the NLP command reviewer device(s) 106 of text commands provided for the command contexts. In some implementations, the NLP command management engine 112 selects NLP command creators for a first crowdsourced job. The first crowdsourced job may provide NLP command creators with a command context, and may request the NLP command creators to provide text commands used for the command context.
In various implementations, the NLP command management engine 112 selects a first group of NLP command reviewers for a second crowdsourced job, and a second group of NLP command reviewers for a third crowdsourced job. The second crowdsourced job may comprise a request to review syntax (spelling, grammar, etc.) of text commands provided in response to the first crowdsourced job. The third crowdsourced job may comprise requesting the second group of NLP command reviewers to determine whether text commands are similar to model text commands for a specific command context. In some implementations, the NLP command management engine 112 provides the NLP command reviewer device(s) 106 with a menu that displays a command context, text commands for the command context provided by one of the NLP command creation device(s) 104, and model text commands for the command context. The menu may allow an NLP command reviewer to select whether or not the text commands provided by the NLP command creation device(s) 104 is similar to any of the model text commands.
The end-user deployment engine 114 may provide natural language transcription services to the NLP end-user device(s) 102 during an end-user deployment phase of the natural language processing environment 100. In various implementations, the end-user deployment engine 114 uses a transcription data datastore. Transcriptions in the transcription data datastore may have been initially generated during the transcription phase. The transcriptions in the transcription data store may comprise, for instance, text commands for a specific command context generated by the NLP command management engine 112 during the training phase of the natural language processing environment 100.
The Structures of the NLP Command Management Engine 112
The network interface engine 202 may be configured to send data to and receive data from the network 108. In some implementations, the network interface engine 202 is implemented as part of a network card (wired or wireless) that supports a network connection to the network 108. The network interface engine 202 may control the network card and/or take other actions to send data to and receive data from the network 108.
The mobile application management engine 204 may be configured to manage a mobile application on the NLP command creation device(s) 104 and the NLP command reviewer device(s) 106. The mobile application management engine 204 may provide installation instructions (e.g., an executable installation file, a link to an application store, etc.) to the NLP command creation device(s) 104 and/or the NLP command reviewer device(s) 106. The mobile application management engine 204 may further instruct a mobile application on the NLP command creation device(s) 104 to render one or more screens that provide NLP command creators with command contexts and request from the NLP command creators text commands corresponding to the command contexts. As an example, the screens may include a text display box that specifies a command context, and a text input box that receives from the NLP command creators text commands for the command context.
In some implementations, the mobile application management engine 204 instructs a mobile application on the NLP command reviewer device(s) 106 to render one or more screens that allow NLP command reviewers to review syntax of text commands. The mobile application management engine 204 may also instruct the mobile application on the NLP command reviewer device(s) 106 to render one or more screens that allow NLP command reviewers to determine whether an NLP command creator's text commands are similar to model text commands for a command context. In various implementations, for instance, the mobile application may display an NLP command creator's text command alongside model text commands for a specified command context. The mobile application may allow NLP command reviewers to select whether the NLP command creator's text command is similar to one or more of the model text commands. In various implementations, the mobile application does not require NLP command reviewers to specify the reasons the NLP command NLP command creator's text command is similar to one or more of the model text commands. Such implementations may present advantages to reviewing text commands, as untrained NLP command reviewers with an intuitive grasp of a language may know a text command represents a command context, but may not know the reason for the similarity to model text commands.
The command creation management engine 206 may manage instructions to the NLP command creation device(s) 104 to create text commands. In some implementations, the command creation management engine 206 creates crowdsourced jobs that provide the NLP command creation device(s) 104 with command contexts using command context data from the command context data datastore 214. The command creation management engine 206 may also select specific NLP command creators with user account data, e.g., user account data gathered from the user account data datastore 216 by the user management engine 208. The command creation management engine 206 may instruct the mobile application management engine 204 to deploy relevant crowdsourced jobs to the NLP command creation device(s) 104.
The user management engine 208 may gather user account data from the user account data datastore 216. In various implementations, the user management engine 208 selects NLP command creators for crowdsourced jobs related to NLP command creation. The user management engine 208 may also select NLP command reviewers for crowdsourced jobs related to command review processes. The user management engine 208 may provide user account data to the other modules of the NLP command management engine 112, such as the mobile application management engine 204.
The syntax engine 210 may evaluate syntax of text strings. In various implementations, the syntax engine 210 compares text commands with syntax in the dictionary to check spelling, grammar, and other attributes of the text commands. The syntax engine 210 may gather relevant reference syntax from the dictionary data datastore 220.
The command review management engine 212 may manage instructions to the NLP command review device(s) 106 to review text commands. In some implementations, the command review management engine 212 creates crowdsourced jobs that provide the NLP command review device(s) 106 with instructions to check syntax of text commands provided by NLP command creators. The command review management engine 212 may further remove specific text commands that NLP command reviewers have indicated should be removed (e.g., text commands that do not comply with syntax requirements, etc.). In some implementations, the command review management engine 212 creates crowdsourced jobs that request NLP command reviewers to compare text commands provided by NLP command creators with model text language variants for a specific command context. To this end, the command review management engine 212 may gather from the model text command data datastore 222 model text language variants for NLP command reviewers to compare with text commands provided by NLP command creators. The command review management engine 212 may further implement comparison thresholds between model text language variants and text commands provided by NLP command creators. The command review management engine 212 may store particular text commands provided by NLP command creators (e.g., those text commands provided by NLP command creators that have passed review of NLP command reviewers) in the transcription library 224.
The command creation management engine 206 may also select specific NLP command creators with user account data, e.g., user account data gathered from the user account data datastore 216 by the user management engine 208. The command creation management engine 206 may instruct the mobile application management engine 204 to deploy relevant crowdsourced jobs to the NLP command creation device(s) 104.
The command context data datastore 214 may store command context data. “Command context data,” as used herein, may refer to a data structure representative of one or more command contexts for the natural language processing environment 100. In some implementations, the command context data is indexed, ordered, etc. by a text string that represents a relevant command context. The command context data may be grouped into command context groups, where each command context group comprises similar command contexts.
The user account data datastore 216 may store user account data related to NLP command creators and NLP command reviewers. User account data may include usernames, first names, last names, email addresses, phone numbers, etc. In some implementations, the user account data includes user identifiers used to identify the NLP command creators and/or NLP command reviewers within the mobile application managed by the mobile application management engine 204.
The caption data datastore 218 may store non-machine readable captions. A “non-machine readable caption,” as used herein, may include an image of text, a soundbite of audio, etc. that is not readable by an automated computer (e.g., by a spambot or other bot, etc.). An example of a non-machine readable caption is a CAPTCHA caption or audio-challenge that requires a human being to interpret. The caption data may be used as part of a first crowdsourcing job, as discussed further herein.
The dictionary data datastore 220 may store a dictionary of appropriate syntax for text commands. The dictionary data datastore 220 may include the words and/or the variants of words commonly used in a particular language. The dictionary data datastore 220 may also include commonly occurring phrases and/or variants of phrases commonly used in a particular language. The dictionary data datastore 220 may be updated at a specified interval (periodically) or by an administrator of the NLP command management engine 112.
The model text command data datastore 222 may store model text commands. Each model text command may comprise text commands an end-user may provide in response to a command context. In some implementations, the model text commands are written by an administrator or other user who manages the NLP server 110. The model text commands may further comprise variants of text commands an end-user may provide in response to a command context. The model text commands may be indexed, ordered, etc. by text strings that represent a relevant command context.
The transcription library 224 may store text commands and command contexts. Each item of data stored in the transcription library 224 may represent a text command created by an NLP command creator, and reviewed by NLP command reviewers, using the techniques described herein. The transcription library 224 may be provided to the end-user deployment engine 114, during the end-user deployment phase of the natural language processing environment 100, as discussed further herein.
The Natural Language Processing Environment 100 in Operation
The natural language processing environment 100 may operate to provide NLP command creators with command contexts through a first crowdsourced job, and receive text commands in response to the first crowdsourced job. The natural language processing environment 100 may also operate to select NLP command reviewers for second and third crowdsourced jobs. In the second crowdsourced job, a first group of NLP command reviewers review the text commands provided by NLP command creators for syntax errors. In the third crowdsourced job, a second group of NLP command reviewers compare the text commands provided by NLP command creators with model text commands for the relevant command context.
Operation when Collecting Text Commands in Response to Command Contexts
In an implementation, the NLP command management engine 112 selects the NLP command creator(s) 302, and selects a command context. The NLP command management engine 112 may create a first crowdsourced job that incorporates instructions for the NLP command creator(s) 302 to provide text commands for the command context. At an operation 304, the first crowdsourced job is deployed to the network 108. At an operation 306, the first crowdsourced job is received by the NLP command creation device(s) 104. The NLP command creation device(s) 104 may display the command context to the NLP command creator(s) 302, and may request the NLP command creator(s) 302 to provide text commands for the command context.
At an operation 308, the NLP command creation device(s) 104 may provide the text commands to the network 108. At an operation 310, the NLP command management engine 112 may receive the text commands. The NLP command management engine 112 may create additional crowdsourced jobs for NLP command reviewers to review the text commands using the techniques described herein.
Operation when Reviewing Text Commands
In some implementations, the NLP command management engine 112 selects text commands to be reviewed. The NLP command management engine 112 may further incorporate the text commands into a second crowdsourcing job for the first NLP command reviewer(s) 402(a). At an operation 404, the NLP command management engine 112 may provide the second crowdsourced job to the network 108. At an operation 406, the first NLP command reviewing device(s) 106(a) may receive the second crowdsourced job. The first NLP command reviewing device(s) 106(a) may ask the first NLP command reviewer(s) 402(a) to determine whether the text commands are syntactically correct. At an operation 408, the first NLP command reviewing device(s) 106(a) may provide a determination of whether the text commands are syntactically correct to the network 108. At an operation 410, the NLP command management engine 112 may receive the determination of whether the text commands are syntactically correct.
The NLP command management engine 112 may further select a second group of NLP command reviewers for a third crowdsourced job. The third crowdsourced job may ask the second group to determine whether the text command is sufficiently similar to model text commands for the command context. At an operation 412, the NLP command management engine 112 sends the third crowdsourced job to the network 108. At an operation 414, the second NLP command reviewer device(s) 106(b) may receive the third crowdsourced job. The second NLP command reviewer device(s) 106(b) may ask the second NLP command reviewer(s) 402(b) to determine whether the text command is sufficiently similar to model text commands for the command context. At an operation 416, the second NLP command reviewer device(s) 106(b) send a response to this determination to the network 108. At an operation 418, the NLP command management engine 112 may receive the response to this determination from the network 108. The NLP command management engine 112 may use this determination to store text variants that pass both the second and the third crowdsourced jobs, as discussed further herein.
At an operation 504, the end-user(s) 502 provide voice data to the NLP end-user device(s) 102. The NLP end-user device(s) 102 may capture the voice data using an audio input device thereon. The NLP end-user device(s) 102 may incorporate the voice data into network-compatible data transmissions, and at an operation 506, may send the network-compatible data transmissions to the network 108.
At an operation 508, the end-user deployment engine 118 may receive the network-compatible data transmissions. The end-user deployment engine 118 may further extract and transcribe the voice data using trained transcription libraries stored in the end-user deployment engine 118. More specifically, the end-user deployment engine 118 may identify validated transcription data corresponding to the voice data in trained transcription libraries. The end-user deployment engine 118 may incorporate the validated transcription data into network-compatible data transmissions. At an operation 510, the end-user deployment engine 118 may provide the validated transcription data to the network 108.
At an operation 512, the NLP end-user device(s) 102 may receive the validated transcription data. The NLP end-user device(s) 102 may further extract the validated transcription data from the network-compatible transmissions. At an operation 514, the NLP end-user device(s) 102 provide the validated transcription data to the end-user(s) 502. In some implementations, the NLP end-user device(s) 102 display the validated transcription data on a display component (e.g., a screen). The NLP end-user device(s) 502 may also use the validated transcription data internally (e.g., in place of keyboard input for a specific function or in a specific application/document).
At an operation 602, command contexts may be gathered for a training phase of the natural language processing environment 100. In some implementations, the command creation management engine 206 gathers command context data from the command context data datastore 214. The command context data may relate to a command context specified by an administrator, manager, etc. of the NLP server 110. The command context data may relate to a command context related to vehicles (directions, speed, or other parameters to control a vehicle, etc.), a command context related to health care, a command context related to a military and/or law enforcement application, a command context related to telephony, a command context related to helping people with disabilities (e.g., instructions for an application for the blind), etc. In some implementations, the command context may relate to instructions to control an application, process, etc. or instructions to search networked resources.
At an operation 604, NLP command creators are selected to for a first crowdsourced job related to the command context. In some implementations, the command creation management engine 206 may instruct the user management engine 208 to identify user account data for people selected to be NLP command creators. The user management engine 208 may retrieve relevant user account data from the user account data datastore 216. The user management engine 208 may also provide the user account data of the NLP command creators to the command creation management engine 206.
At an operation 606, the first crowdsourced job may be configured with instructions for the NLP command creators to write text commands for the command contexts. More specifically, the command creation management engine 206 may configure the first crowdsourced job with a script that provides the command context alongside a data structure configured to receive text for the text commands.
At an operation 608, periodic captions may be inserted into the first crowdsourced job. In an implementation, the command creation management engine 206 may gather non-machine readable captions (e.g., CAPTCHA captions or audio-challenges that require a human being to interpret) from the caption data datastore 218. The command creation management engine 206 may incorporate these non-machine readable captions into the first crowdsourced job for the NLP command creators selected for the first crowdsourced job.
At an operation 610, the first crowdsourced job may be deployed to a mobile application used by the NLP command creators. In some implementations, the command creation management engine 206 may provide the first crowdsourced job to the mobile application management engine 204. The mobile application management engine 204 may incorporate the first crowdsourced job into the mobile application on the NLP command creation device(s) 104. In various implementations, the mobile application management engine 204 incorporates the instructions for the NLP command creators to write text commands for the command contexts and/or the periodic captions into specific screens of the mobile application. The mobile application management engine 204 may receive text commands in response to the first crowdsourced job, as described further herein. The process 600 may subsequently terminate.
At an operation 702, a text response to the first crowdsourced job may be received from the NLP command creators. In various implementations, the mobile application management engine 204 may receive text commands NLP command creators entered into the mobile application on the NLP command creation device(s) 104. The mobile application management engine 204 may provide these text commands to the other modules of the NLP command management engine 112 for further processing. For example, in an implementation, the mobile application management engine 204 provides the text commands to the syntax engine 210.
At an operation 704, the text command may be provided to an automated syntax checker. More specifically, the mobile application management engine 204 may provide the text command to the syntax engine 210. The syntax engine 210 may evaluate the text commands for syntax errors in accordance with the dictionary data in the dictionary data datastore 220. In some implementations, the syntax engine 210 checks the text commands for spelling, grammar, and/or other syntax errors. The syntax engine 210 may provide whether specific text commands pass the syntax check to other modules of the NLP command management engine 112, such as the command review management engine 212.
At an operation 706, language variants of the text command that do not pass the automated syntax checker may be identified. More particularly, the command review management engine 212 may identify language variants of the text command that fail spelling, grammar, and other syntax checks. As discussed further herein, these variants of the text commands may be used to form the basis of additional crowdsourced jobs.
At an operation 708, a second crowdsourced job requesting a first group of NLP command reviewers to perform a syntax check of the language variants may be created and/or deployed. The command review management engine 212 may create a second crowdsourced job that selects a first group of the NLP command reviewers, and instructs the first group of NLP command reviewers to check the syntax of the language variants. The mobile application management engine 204 may instruct the mobile application on the NLP command review device(s) 106 to display on the NLP command review device(s) 106 a text editing window that allows the NLP command reviewers to manually check syntax of the language variants. Doing so allows people, rather than a machine, to review whether a text command contains a known variant of a word (e.g., “sup” instead of “what's up”) or contains a syntax error. The mobile application management engine 204 may receive responses from the first group of the NLP command reviewers.
At an operation 710, language variants that fail the syntax check by NLP command reviewers may be removed. More particularly, the command review management engine 212 may remove language variants that fail the syntax check by NLP command reviewers may be removed. As discussed herein, these language variants have been identified as containing syntax errors by both automated syntax checkers (e.g., the syntax engine 210) and by people (e.g. the first group of NLP command reviewers). As a result, these language variants can be removed with a high confidence that they contain syntax errors.
At an operation 712, a third crowdsourced job requesting a second group of NLP command reviewers to compare the flagged language variants with model language variants of the text command may be created and/or deployed. In some implementations, the command review management engine 212 creates a third crowdsourced job, and selects a second group of NLP command reviewers for the third crowdsourced job. The command review management engine 212 may gather model text commands from the model text command data and may provide the model text commands, along with the text command provided by the NLP command creator, to the second group of NLP command reviewers. The mobile application management engine 204 may incorporate the model commands alongside the text command provided by the NLP command creator in a window in the mobile application on the NLP command reviewer device(s) 106. In some implementations, NLP command reviewers are provided (a) the command context, and (b) a side-by-side comparison of the model text commands and the text command provided by the NLP command creator. NLP command reviewers may be given the option to select whether the text command provided by the NLP command creator is sufficiently similar to model text commands given the command context. The mobile application management engine 204 may responses to the third crowdsourcing job, and may provide these responses to the command review management engine 212.
At an operation 714, each of the comparisons of the flagged language variants and the model language variants may be scored. The command review management engine 212 may assign scores as to whether the second group of the NLP command reviewers found text command provided by the NLP command creator is sufficiently similar to model text commands given the command context. At an operation 716, it may be determined whether one or more of the comparisons exceed a comparison threshold. More particularly, the command review management engine 212 may make the determination about whether a threshold has been exceeded.
At an operation 718, the language variants associated with comparisons that exceed the comparison threshold may be stored in a transcription library. In some implementations, the command review management engine 212 may store the language variants associated with comparisons that exceed the comparison threshold in the transcription library 224. The process 700 may subsequently terminate.
In some implementations, the screen in
The computer 902 interfaces to external systems through the communications interface 910, which can include a modem or network interface. It will be appreciated that the communications interface 910 can be considered to be part of the computer system 900 or a part of the computer 902. The communications interface 910 can be an analog modem, ISDN modem, cable modem, token ring interface, satellite transmission interface (e.g. “direct PC”), or other interfaces for coupling a computer system to other computer systems.
The processor 908 can be, for example, a conventional microprocessor such as an Intel Pentium microprocessor or Motorola power PC microprocessor. The memory 912 is coupled to the processor 908 by a bus 920. The memory 912 can be Dynamic Random Access Memory (DRAM) and can also include Static RAM (SRAM). The bus 920 couples the processor 908 to the memory 912, also to the non-volatile storage 916, to the display controller 914, and to the I/O controller 918.
The I/O devices 904 can include a keyboard, disk drives, printers, a scanner, and other input and output devices, including a mouse or other pointing device. The display controller 914 can control in the conventional manner a display on the display device 906, which can be, for example, a cathode ray tube (CRT) or liquid crystal display (LCD). The display controller 914 and the I/O controller 918 can be implemented with conventional well known technology.
The non-volatile storage 916 is often a magnetic hard disk, an optical disk, or another form of storage for large amounts of data. Some of this data is often written, by a direct memory access process, into memory 912 during execution of software in the computer 902. One of skill in the art will immediately recognize that the terms “machine-readable medium” or “computer-readable medium” includes any type of storage device that is accessible by the processor 908 and also encompasses a carrier wave that encodes a data signal.
The computer system 900 is one example of many possible computer systems which have different architectures. For example, personal computers based on an Intel microprocessor often have multiple buses, one of which can be an I/O bus for the peripherals and one that directly connects the processor 908 and the memory 912 (often referred to as a memory bus). The buses are connected together through bridge components that perform any necessary translation due to differing bus protocols.
Network computers are another type of computer system that can be used in conjunction with the teachings provided herein. Network computers do not usually include a hard disk or other mass storage, and the executable programs are loaded from a network connection into the memory 912 for execution by the processor 908. A Web TV system, which is known in the art, is also considered to be a computer system, but it can lack some of the features shown in
Some portions of the detailed description are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
Techniques described in this paper relate to apparatus for performing the operations. The apparatus can be specially constructed for the required purposes, or it can comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program can be stored in a computer readable storage medium, such as, but is not limited to, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
For purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the description. It will be apparent, however, to one skilled in the art that implementations of the disclosure can be practiced without these specific details. In some instances, modules, structures, processes, features, and devices are shown in block diagram form in order to avoid obscuring the description. In other instances, functional block diagrams and flow diagrams are shown to represent data and logic flows. The components of block diagrams and flow diagrams (e.g., modules, blocks, structures, devices, features, etc.) may be variously combined, separated, removed, reordered, and replaced in a manner other than as expressly described and depicted herein.
Reference in this specification to “one implementation”, “an implementation”, “some implementations”, “various implementations”, “certain implementations”, “other implementations”, “one series of implementations”, or the like means that a particular feature, design, structure, or characteristic described in connection with the implementation is included in at least one implementation of the disclosure. The appearances of, for example, the phrase “in one implementation” or “in an implementation” in various places in the specification are not necessarily all referring to the same implementation, nor are separate or alternative implementations mutually exclusive of other implementations. Moreover, whether or not there is express reference to an “implementation” or the like, various features are described, which may be variously combined and included in some implementations, but also variously omitted in other implementations. Similarly, various features are described that may be preferences or requirements for some implementations, but not other implementations.
The language used herein has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope be limited not by this detailed description, but rather by any claims that issue on an application based hereon. Accordingly, the disclosure of the implementations is intended to be illustrative, but not limiting, of the scope, which is set forth in the claims recited herein.
Claims
1. A computer-implemented method, the method being implemented in a computer system having one or more physical processors programmed with computer program instructions that, when executed by the one or more physical processors, cause the computer system to perform the method, the method comprising:
- identifying a command context for a natural language processing system, the command context associated with a command context condition to provide commands to the natural language processing system;
- selecting one or more command creators, the one or more command creators being associated with one or more command creation devices; and
- configuring a first application on the one or more command creation devices to display the command context, to display command creation instructions for each of the one or more command creators to provide text commands that satisfy the command context, and to display a field for capturing a user-generated text entry to satisfy the command creation condition in accordance with the command creation instructions.
2. The method of claim 1, wherein the first application is configured to display a plurality of command contexts, and to capture, for each of the plurality of command contexts, a plurality of user-generated text entries.
3. The method of claim 1, wherein configuring the first application comprises periodically displaying a non-machine readable caption on the first application.
4. The method of claim 1, wherein the non-machine readable caption comprises one or more of a non-machine readable image and a non-machine readable sound bite.
5. The method of claim 1, further comprising:
- selecting one or more command reviewers, the one or more command reviewers being associated with one or more command reviewer devices; and
- configuring a second application on the one or more command reviewer devices with command review instructions for the one or more command reviewers to review the user-generated text entry.
6. The method of claim 5, wherein the command review instructions instruct the one or more command reviewers to review the syntax of the user-generated text entry.
7. The method of claim 5, wherein the command review instructions instruct the one or more command reviewers to compare the user-generated text entry with model text language variants for the command context.
8. The method of claim 7, wherein the second application is configured to display the user-generated text entry and the command context alongside the model text language variants, and to receive a selection from the one or more command reviewers whether the user-generated text entry is as similar to command context condition as one of the model text language variants.
9. The method of claim 1, further comprising checking syntax of the user-generated text entry.
10. The method of claim 1, further comprising storing the user-generated text entry in a transcription library.
11. The method of claim 1, further comprising using the user-generated text entry for transcription of voice data during an end-user deployment phase of the natural language processing system.
12. A system comprising:
- memory;
- one or more physical processors programmed with one or more computer program instructions which, when executed, cause the one or more physical processors to: identify a command context for a natural language processing system, the command context associated with a command context condition to provide commands to the natural language processing system; select one or more command creators, the one or more command creators being associated with one or more command creation devices; and configure a first application on the one or more command creation devices to display the command context, to display command creation instructions for each of the one or more command creators to provide text commands that satisfy the command context, and to display a field for capturing a user-generated text entry to satisfy the command creation condition in accordance with the command creation instructions.
13. The system of claim 12, wherein the first application is configured to display a plurality of command contexts, and to capture, for each of the plurality of command contexts, a plurality of user-generated text entries.
14. The system of claim 12, wherein configuring the first application comprises periodically displaying a non-machine readable caption on the first application.
15. The system of claim 12, wherein the non-machine readable caption comprises one or more of a non-machine readable image and a non-machine readable sound bite.
16. The system of claim 12, wherein the instructions which, when executed, further cause the one or more physical processors to:
- select one or more command reviewers, the one or more command reviewers being associated with one or more command reviewer devices; and
- configure a second application on the one or more command reviewer devices with command review instructions for the one or more command reviewers to review the user-generated text entry.
17. The system of claim 16, wherein the command review instructions instruct the one or more command reviewers to review the syntax of the user-generated text entry.
18. The system of claim 16, wherein the command review instructions instruct the one or more command reviewers to compare the user-generated text entry with model text language variants for the command context.
19. The system of claim 18, wherein the second application is configured to display the user-generated text entry and the command context alongside the model text language variants, and to receive a selection from the one or more command reviewers whether the user-generated text entry is as similar to command context condition as one of the model text language variants.
20. The system of claim 12, further comprising checking syntax of the user-generated text entry.
21. The system of claim 12, further comprising storing the user-generated text entry in a transcription library.
22. The system of claim 12, further comprising using the user-generated text entry for transcription of voice data during an end-user deployment phase of the natural language processing system.
23. A computer program product comprising:
- one or more tangible, non-transitory computer-readable storage devices;
- program instructions, stored on at least one of the one or more tangible, non-transitory computer-readable tangible storage devices that, when executed, cause a computer to: identify a command context for a natural language processing system, the command context associated with a command context condition to provide commands to the natural language processing system; select one or more command creators, the one or more command creators being associated with one or more command creation devices; and configure a first application on the one or more command creation devices to display the command context, to display command creation instructions for each of the one or more command creators to provide text commands that satisfy the command context, and to display a field for capturing a user-generated text entry to satisfy the command creation condition in accordance with the command creation instructions.
Type: Application
Filed: Oct 9, 2017
Publication Date: Feb 1, 2018
Applicant: VoiceBox Technologies Corporation (Bellevue, WA)
Inventors: Spencer John ROTHWELL (Seattle, WA), Daniela BRAGA (Bellevue, WA), Ahmad Khamis ELSHENAWY (Lynnwood, WA), Stephen Steele CARTER (Seattle, WA)
Application Number: 15/728,176