TASK PERFORMANCE ADJUSTMENT BASED ON VIDEO ANALYSIS

Info

Publication number: 20230176911
Type: Application
Filed: Sep 23, 2022
Publication Date: Jun 8, 2023
Inventor: Michael Griffin (Wayland, MA)
Application Number: 17/951,953

Abstract

A method of adjusting task performance includes extracting image data and audio data from video data and extracting semantic text data from the audio data. The method further includes analyzing at least one of the image data, the audio data, and the semantic text data to identify a first set of features, generating an adjustment recommendation based on the first set of features and a relational feature model, and outputting the adjustment recommendation. The video data portrays a first individual performing a first iteration of a task. The at least one of the image data, audio data, and semantic text data is analyzed by a first computer-implemented machine learning model. The adjustment recommendation is generated by a second computer-implemented machine learning model and comprises instructions that can be performed by the first individual to adjust task performance. The relational feature model relates features and task performance.

Description

Description

CROSS-REFERENCE TO RELATED APPLICATION(S)

This application claims the benefit of U.S. Provisional Application No. 63/286,844 filed Dec. 7, 2021 for “MACHINE LEARNING METHOD TO QUANTIFY PRESENT STATE-OF-MIND AND PREDICT FUTURE STATE-OF-MIND OF ONE OR MORE INDIVIDUALS BASED ON VIDEO IMAGES OF THOSE INDIVIDUALS” by M. Griffin, H. Kotvis, K. Lumb, K. Poulson, and J. Miner, the disclosure of which is incorporated in its entirety by reference herein; of U.S. Provisional Application 63/405,724 filed Sep. 12, 2022 for “TASK PERFORMANCE ADJUSTMENT BASED ON VIDEO ANALYSIS” by M. Griffin, the disclosure of which is incorporated in its entirety by reference herein; of U.S. Provisional Application 63/405,709 filed Sep. 12, 2022 for “ADJUSTING MENTAL STATE TO IMPROVE TASK PERFORMANCE” by M. Griffin, the disclosure of which is incorporated in its entirety by reference herein; of U.S. Provisional Application 63/405,712 filed Sep. 12, 2022 for “ADJUSTING MENTAL STATE TO IMPROVE TASK PERFORMANCE AND COACHING IMPROVEMENT” by M. Griffin, the disclosure of which is incorporated in its entirety by reference herein; and of U.S. Provisional Application 63/405,714 filed Sep. 12, 2022 for “ADJUSTING MENTAL STATE TO IMPROVE TASK PERFORMANCE AND COACHING IMPROVEMENT” by M. Griffin, the disclosure of which is also incorporated in its entirety by reference herein.

BACKGROUND

The present disclosure relates to task performance adjustment and, more particularly, systems and methods for providing recommendations for adjusting task performance using video data.

Many individuals are required to perform tasks that involve communicative elements as part of their jobs. It can be difficult for individuals to improve their abilities to communicate based solely on audience feedback. Although individuals can attempt to improve their ability to perform a task that includes communicative elements by seeking out instruction from a more experienced communicator, it can be difficult for conventional instruction to provide tangible improvements or adjustments to communication skill. Further, some individuals have impairments or disabilities that can significantly decrease the ability of the individual to use audience feedback or other conventional metrics to understand how effective they are at communicating.

SUMMARY

According to one aspect of the present disclosure, a method of adjusting task performance includes extracting image data and audio data from video data and extracting semantic text data from the audio data. The method further includes analyzing at least one of the image data, the audio data, and the semantic text data to identify a first set of features, generating an adjustment recommendation based on the first set of features and a relational feature model, and outputting the adjustment recommendation. The video data portrays a first individual performing a first iteration of a task. The data of the image data, audio data, and semantic text data is analyzed by a first computer-implemented machine learning model. The adjustment recommendation is generated by a second computer-implemented machine learning model and comprises instructions that can be performed by the first individual to adjust task performance. The relational feature model relates features and task performance.

According to another aspect of the present disclosure, a system for adjusting task performance includes processor, a user interface, and memory. The user interface is configured to enable an operator to interact with the processor. The memory is encoded with instructions that, when executed, cause the processor to acquire video data of a first individual performing a first iteration of the task, extract image data and audio data from video data, extract semantic text data from the audio data, and analyze at least one of the image data, the audio data, and the semantic text data with a first computer-implemented machine learning model to identify a first set of features. The instructions further cause the processor to generate an adjustment recommendation based on the first set of features and a relational feature model and cause the user interface to output the adjustment recommendation. The relational feature model relates features and task performance and the adjustment recommendation comprises instructions that can be performed by the first individual to adjust task performance.

According to yet a further aspect of the present disclosure, a method of scoring task performance includes extracting image data and audio data from video data and extracting semantic text data from the audio data. The method further includes analyzing at least one of the image data, the audio data, and the semantic text data to identify a first set of features, generating a performance score based on the first set of features and a relational feature model, and outputting the performance score. The video data portrays a first individual performing a first iteration of a task. The data of the image data, audio data, and semantic text data is analyzed by a first computer-implemented machine learning model. The adjustment recommendation is generated by a second computer-implemented machine learning model and comprises instructions that can be performed by the first individual to adjust task performance. The performance score comprises one or more alphanumeric characters that describe the performance of the first individual.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic view of an example of a mental state classifier.

FIG. 2 is a flow diagram of an example of a method of improving task performance.

FIG. 3 is a flow diagram of an example of a method of identifying features from audio data for use with the method of FIG. 2.

FIG. 4 is a flow diagram of an example of a method of generating a reference feature set for use with the method of FIG. 2.

DETAILED DESCRIPTION

The present disclosure relates to systems and methods for providing task performance adjustment to an individual performing the task. Specifically, the present disclosure provides methods for providing performance scores and adjustment recommendations to an individual based on video analysis of the individual performing the task. As will be explained in more detail subsequently, the methods and systems described herein can be performed on a wide variety of tasks, including tasks having communication elements and tasks lacking communication elements.

As used herein, a “task” refers to any work, job, duty, labor, chore, or project that can be performed by an individual. The task can have audio elements that can be captured by, for example, a microphone and/or visual elements that can be captured by, for example, a camera. In some examples, the task includes communication elements. Examples of tasks that have communication elements include, for example, lying, acting, lecturing, public speaking, and teaching. In other examples, the task can lack communication elements. For example, the task can include non-vocal acting, dancing, or another skilled task having movement-based elements.

Existing methods of adjusting task performance often rely on self-study or through instruction provided by more experienced practitioners of the task. In some examples, existing methods can be impractical. In other examples, existing methods can provide differing and, in some cases, conflicting information to an individual, reducing the ability of and individual to adjust and/or improve their ability to perform the task. Conversely, the methods and systems described herein use pre-existing examples of effective task performance to provide scores describing an individual's task performance and/or recommendations for adjusting the individual's task performance to be more like the pre-existing examples of effective task performance. Advantageously, this does not require an individual to reconcile multiple opinions provided by multiple sources, such as written sources, recorded sources, and/or through instruction by more experienced practitioner of the task.

FIG. 1 is a schematic view of performance adjustment system 100, which is a hardware device configured to implement one or more machine learning models used perform methods described herein and also produce improvement recommendations and/or performance scores that can be used for coaching. In the depicted example, performance adjustment system 100, which includes processor 102, memory 104, and user interface 106 and is connected to camera devices 108A-N. Camera devices 108A-N capture video data 110A-N of individuals 112A-N. Memory 104 includes video processing module 120, feature extraction module 130, performance adjustment module 140, and performance scoring module 150.

Processor 102 can execute software, applications, and/or programs stored on memory 104. Examples of processor 102 can include one or more of a processor, a microprocessor, a controller, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other equivalent discrete or integrated logic circuitry. Processor 102 can be entirely or partially mounted on one or more circuit boards.

Memory 104 is configured to store information and, in some examples, can be described as a computer-readable storage medium. Memory 104, in some examples, is described as computer-readable storage media. In some examples, a computer-readable storage medium can include a non-transitory medium. The term “non-transitory” can indicate that the storage medium is not embodied in a carrier wave or a propagated signal. In certain examples, a non-transitory storage medium can store data that can, over time, change (e.g., in RAM or cache). In some examples, memory 104 is a temporary memory. As used herein, a temporary memory refers to a memory having a primary purpose that is not long-term storage. Memory 104, in some examples, is described as volatile memory. As used herein, a volatile memory refers to a memory that that the memory does not maintain stored contents when power to the memory 104 is turned off. Examples of volatile memories can include random access memories (RAM), dynamic random access memories (DRAM), static random access memories (SRAM), and other forms of volatile memories. In some examples, memory 104 is used to store program instructions for execution by processor 102. Memory 104, in one example, is used by software or applications running on the performance adjustment system (e.g., by a computer-implemented machine learning model or a data processing module) to temporarily store information during program execution.

Memory 104, in some examples, also includes one or more computer-readable storage media. The memory can be configured to store larger amounts of information than volatile memory. The memory can further be configured for long-term storage of information. In some examples, the memory includes non-volatile storage elements. Examples of such non-volatile storage elements can include, for example, magnetic hard discs, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories.

User interface 106 is an input and/or output device and enables an operator to control operation of performance adjustment system 100. For example, user interface 106 can be configured to receive inputs from an operator and/or provide outputs regarding task performance adjustments and evaluations of task performance. User interface 106 can include one or more of a sound card, a video graphics card, a speaker, a display device (such as a liquid crystal display (LCD), a light emitting diode (LED) display, an organic light emitting diode (OLED) display, etc.), a vibration or rumble motor, an accelerometer, a touchscreen, a keyboard, a mouse, a joystick, or other type of device for facilitating input and/or output of information in a form understandable to users and/or machines.

Performance adjustment system 100 is configured to perform one or more methods described herein. Performance adjustment system 100 can accept data from and/or can be operably connected to an audiovisual data stream and/or an audiovisual data file. Performance adjustment system 100 can use data from an audiovisual data stream and/or an audiovisual data file to determine performance adjustment information. More generally, performance adjustment system 100 is configured to perform any of the functions attributed herein to a performance adjustment system, including receiving an output from any source referenced herein, detecting any condition or event referenced herein, and generating and providing data and information as referenced herein.

Performance adjustment system 100 can be a discrete assembly or be formed by one or more devices capable of individually or collectively implementing functionalities and generating and outputting data as discussed herein. In some examples, performance adjustment system 100 can be implemented as a plurality of discrete circuitry subassemblies. In some examples, performance adjustment system 100 can include or be implemented at least in part as a smartphone or tablet, among other options. In some examples, performance adjustment system 100 and/or user interface 106 of performance adjustment system 100 can include and/or be implemented as downloadable software in the form of a mobile application. The mobile application can be implemented on a computing device, such as a personal computer, tablet, or smartphone, among other suitable devices. Performance adjustment system 100 can be considered to form a single computing device even when distributed across multiple component devices.

Camera devices 108A-N are capable of capturing video data 110A-N of one or more individuals 112A-N. In the depicted example, each camera device 108A-N captures video data 110A-N of a single individual 112A-N. In other examples, each camera device 108A-N captures video data 110A-N of multiple individuals 112A-N. Each camera device 108A-N is configured to be able to communicate with mental state classifier 100 and mental state classifier 100 is configured to communicate with each camera device 108A-N. Camera devices 108A-N can be, for example, a video camera, a webcam, or another suitable source for obtaining video data 110A-N. Camera devices 108A-N can be controlled by mental state classifier 100 or by another suitable video device. Video data 110A-N are audiovisual data feeds portraying individuals 112A-N. Video data 110A-N can be stored to memory 104 for use with one or more methods described herein or can be stored to another storage media and recalled to memory 104 for use with one or more methods described herein.

Although mental state classification system 50 is depicted as only including three camera devices 108A-N, mental state classification system 50 can include any number of camera devices 108A-N. Each additional camera device 108A-N can capture video data 110A-N portraying another individual 110A-N. Similarly, although each of video data 110A-N is depicted as portraying a single individual 110A-N, in other examples each of video data 110A-N can depict two or more individuals 110A-N.

Video processing module 120 includes one or more programs for processing video data 110A-N. For example, video processing module 120 can include one or more programs for extracting image data, audio data, and semantic text data from video data 110A-N. As used herein, “image data” refers to the portion of video data 110A-N that is a series of still images, “audio data” refers to the sound data stored in video data 110A-N, and semantic text data refers to data that represents spoken words, phrases, sentences, and other sounds produced by the individual as readable text.

Feature extraction module 130 includes one or more programs for classifying the image data, audio data, and semantic text data extracted by video processing module 120. Feature extraction module 130 can include one or more programs for extracting classifiable features from the image data, audio data, and/or semantic text data. In some examples, feature extraction module 130 can include one or more computer-implemented machine learning models for extracting classifiable features from the image data, audio data, and/or semantic text data. The features extracted by feature extraction module 130 are capable of being classified to predict an individual's mental state and/or to identify the individual.

Performance adjustment module includes 140 or more programs for evaluating task performance based on the features extracted by feature extraction module 130. Performance adjustment module 140 is configured to evaluate task performance by comparing features extracted by feature extraction module 130 with a relational feature model that relates features to task performance. In some examples, the relational feature model is a computer-implemented machine learning model trained to recognize features associated with positive task performance and/or features associated with negative task performance. Performance adjustment module 140 also includes one or more programs for generating an adjustment recommendation. The adjustment recommendation can be used by the individual to improve task performance in subsequent iterations of the task.

Performance scoring module 150 includes one or more programs for generating a performance score representative of task performance recorded and/or captured in one of video data 110A-N. Specifically, performance scoring module 150 is configured to generate a score based on the features extracted by feature extraction module 130 and the relational feature model used by performance adjustment module 140. The score is representative of the individual's performance and can be output as one or more alphanumeric values.

FIG. 2 is a flow diagram of method 200, which is a method of generating performance adjustment recommendations. Method 200 includes steps 202-236 of acquiring video data (step 202), extracting image data (step 204), generating a first feature set (step 206), extracting audio data (step 214), generating a second feature set (step 216), extracting semantic text data (step 224), generating a third feature set (step 226), generating an adjustment recommendation (step 230), generating a performance score (step 232), outputting the adjustment recommendation (step 234), and outputting a performance score (step 236). Steps 202, 204, 214, and 224 can be performed by, for example, video processing module 120 of performance adjustment system 100 (FIG. 1). Steps 206, 216, and 226 can be performed by, for example, feature extraction module 130 of performance adjustment system 100 (FIG. 1). Steps 230 and 232 can be performed by, for example, performance adjustment module 140 and performance scoring module 150, respectively, of performance adjustment system 100 (FIG. 1).

In step 202, video data is acquired by processor 102. The video data can be of any length and can be any media source having audio and/or image components. The video data can be delivered to performance adjustment system 100 from a video source and/or performance adjustment system 100 can request the video data from the video source. The video source can be any suitable source of video, such as a multimedia file or a video stream. The video source can also be, for example, video data 110A-N captured by cameras 108A-N (FIG. 1). In some examples, the video source can be a videoconferencing platform.

The video data acquired in step 202 contains footage of at least one individual performing a task. The task can include communication elements or can lack communication elements. For example, the task can be one of lying, acting, lecturing, public speaking, teaching, and dancing. The video data can include footage depicting a complete iteration of the task or can depict only a portion of the task performance. In some examples where the video data contains a complete iteration of the task, the video can be segmented and task performance can be analyzed via method 200 for each segment. Additionally and/or alternatively, the video data can contain multiple iterations of the individual performing the task and task performance can be analyzed for each iteration using method 200.

In some examples, processor 102 can be configured to detect when the video data depicts task performance and to sample only the portions of the video data that depict task performance into one or more video segments for use with method 200. In other examples, the video data is sampled at pre-determined intervals into one or more segments for use with method 200. Where the video data is sampled into segments, method 200 can be performed for each sampled segment of the video data.

In some examples, the video data acquired in step 202 can contain video depicting more than one individual. As will be explained in more detail subsequently, image and/or audio data for each individual can be separated for separate analysis using method 200. Where more than one type of data is used to assess task performance (i.e., more than one of image, audio, or semantic text data), feature data for each data type can be used in steps 230 and 232, as will be discussed subsequently in more detail.

In step 204, processor 102 extracts images of the individual from the video acquired in step 202. Processor 102 can use programs of video processing module 120 to perform step 204. The extracted images are stored to memory 104 as still images. Processor 102 can be configured to identify an individual performing the task in the image data and only store image data that corresponds to task performance to memory 104 as the extracted image data. For example, processor 102 can execute a computer-implemented machine learning model trained to identify image data that depicts an individual performing the task and, accordingly, trim the complete image data to include only the still images that depict task performance. Trimming the image data reduces file size of image data used in method 200 by excluding image data that does not depict task performance.

In examples where more than individual is present in the image data, the individual images can be cropped by processor 102 so that only a single individual is contained in the extracted individual images. Where more than one individual is present in the image data extracted in step 202, individual individuals can be assigned an identifier, and the cropped individual images can be associated with the identifier. The identifier can be, for example, a name, a number, or another suitable method of identifying the individual. In some examples, the individual images are cropped such that only the individual's face is contained in the extracted individual images. The program used to identify an individual from image data can be, for example, a machine learning model, such as a computer vision model.

In step 206, the extracted images are analyzed to generate a first feature set. The features can be identified using a computer-implemented machine learning model trained to identify features from the image data. Step 206 can be performed by programs and/or machine learning models of feature extraction module 130. The features identified in step 206 are features related to information conveyance, such as body language visible in the image data. In some examples, the machine learning model can be trained only to identify features related to task performance. In other examples, a first machine learning model can be trained to identify a broad set of features visible in the image data, including features that are not related to task performance, and features related to task performance can be determined by a second computer-implemented machine learning model for use with subsequent steps of method 200. Additionally and/or alternatively, a broad set of features can be included in the first feature set and computer-implemented machine learning algorithms used in steps 230 and/or 232 of method 200 can be trained to identify features related to task performance.

For example, if the task is public speaking, relevant features included in step 206 could be related to eye contact, body posture (e.g., presence of slouching), or another element of body language that relates to public speaking effectiveness. In other applications, the features can include one or more of, for example, hand gestures, head tilt, the presence and amount of eye contact, the amount of eye blinking, forehead wrinkling, mouth position, mouth shape, eyebrow shape, and/or eyebrow position.

In step 214, processor 102 extracts audio from the video data acquired in step 202. The extracted audio is stored to memory 104. Processor 102 can use programs of video processing module 120 to perform step 214. Processor 102 can be configured to identify an individual performing the task in the audio data and store only audio data corresponding to task performance as the extracted audio data. For example, processor 102 can execute a computer-implemented machine learning model trained to identify audio data that corresponds to an individual performing the task and, accordingly, store audio data containing only task performance to memory 104 during step 214, reducing file size of image data used in method 200 by excluding image data that does not depict task performance.

Where more than one individual is present in the audio data extracted in step 202, each individual can be assigned an identifier, and the trimmed audio can be associated with the identifier. The identifier can be, for example, a name, a number, or another suitable method of identifying the individual among other individual in the audio. The program used to determine which portions of the audio correspond to task performance by the individual can be, for example, a machine learning model, such as a computer vision model. In some examples, speaker diarization of the audio file can be performed to separate the audio corresponding to each individual. As described previously, image data for each individual in the video data can also be isolated (e.g., by cropping), and the image and audio data corresponding to each individual can be re-associated prior to steps generation of an adjustment recommendation and/or of a performance score in steps 230 and 232, respectively.

In step 216, the extracted audio data is analyzed to generate a second feature set. The features can be identified using a computer-implemented machine learning model trained to identify features from the audio data. Step 216 can be performed by programs and/or machine learning models of feature extraction module 130. The features identified in step 206 are features related to information conveyance, such as vocal tone or cadence. In some examples, the machine learning model can be trained only to identify features related to task performance. In other examples, a first machine learning model can be trained to identify a broad set of features present in the audio data, including features that are not related to task performance, and features related to task performance can be determined by a second computer-implemented machine learning model for use with subsequent steps of method 200. Additionally and/or alternatively, a broad set of features can be included in the first feature set and computer-implemented machine learning algorithms used in steps 230 and/or 232 of method 200 can be trained to identify features related to task performance.

For example, if the task is to provide an untruthful response to questioning or interrogation, relevant features included in step 216 could be the presence of vocal wavering, stuttering, or another element of vocalization that relates to the effectiveness of lying. In other applications, the features can include, for example, pitch, intonation, inflection, sentences stress, or another audio element indicative of information conveyance.

In some examples, the audio data can be converted to an audio spectrogram and that can be analyzed in step 216 to generate the second feature set. FIG. 3 is a flow diagram of method 300, which is a method of analyzing audio data that can be performed during step 216 of method 200. Method 300 includes steps 302-304 of generating an audio spectrogram (step 302) and analyzing the audio spectrogram to identify features for the second feature set (step 304).

In step 302, processor 102 converts the audio data extracted in step 214 to a spectrogram. The spectrogram can describe, for example, the amplitude or frequency ranges of the audio data. In step 304, processor 102 identifies features present in the audio spectrogram for inclusion in the second feature set. In some examples, processing the audio data as an audio spectrogram enables processor 102 to more easily identify features in the audio data.

Returning to method 200, in step 224, processor 102 extracts semantic text data from the audio data extracted in step 214. As used herein, semantic text data refers to data that represents spoken words, phrases, sentences, and other sounds produced by the individual as readable text. The extracted semantic text data is stored to memory 104. Processor 102 can use programs of video processing module 120 to perform step 224. The semantic text data can be extracted using, for example, a text-to-speech program. In some examples, the semantic text data can be extracted from trimmed audio data that only includes audio data of the task performance. In other examples, semantic text data is extracted from the complete audio data extracted in step 214, and the semantic text data can be subsequently trimmed to only include semantic text that occurs during task performance. The semantic text data can be trimmed based on, for example, timestamp information from the trimmed audio or image data. The semantic text data can also be trimmed based on content, such as by a computer-implemented machine learning model trained to identify semantic text relevant to task performance. Where more than one individual is present in the audio data extracted in step 214, semantic text data can be generated separately for each individual's audio data and the identifier assigned to each individual's audio data in step 214 can also be assigned to the semantic text data generated in step 224.

In step 226, the extracted semantic text data is analyzed to generate a third feature set. The features can be identified using a computer-implemented machine learning model trained to identify features from the semantic text data. The features can be, for example, phonemes, words, phrases, sentences, or other units of language that convey information and are stored in the semantic text data. The features can also be, for example, one or more intents and/or one or more entities in the semantic text data, as recognized by a natural language understanding model. Step 226 can be performed by programs and/or machine learning models of feature extraction module 130.

In some examples, the machine learning model can be trained only to identify features related to task performance. In other examples, a first machine learning model can be trained to identify a broad set of features present in the audio data, including features that are not related to task performance, and features related to task performance can be determined by a second computer-implemented machine learning model for use with subsequent steps of method 200. Additionally and/or alternatively, a broad set of features can be included in the first feature set and computer-implemented machine learning algorithms used in steps 230 and/or 232 of method 200 can be trained to identify features related to task performance.

In step 230, processor 102 generates an adjustment recommendation. The adjustment recommendation is generated a relational feature model, and at least one of the sets of features of the first, second, and third sets of features, and includes one or more instructions for actions or adjustments to features relevant to task performance. Step 230 can be performed by programs and/or machine learning models of performance adjustment module 140.

The relational feature model relates features and their contribution to task performance. Relational feature models are task-specific, such that different relational feature models are used to determine adjustment recommendations for different tasks. Each feature relevant to task performance and its respective impact on task performance for a given task can be stored in the relational feature model. The relational feature model can be constructed, for example, by analyzing video data of positive and/or negative task performances and correlating the features present in the video data to task performance. Whether a performance is positive or negative can, in some examples, be determined subjectively for the purpose of constructing the relational feature model.

In some examples, the relational feature model can be a computer-implemented machine learning model trained to correlate features of the first, second, and/or third feature sets with task performance. In other examples, the relational feature model can be generated during the process of training a computer-implemented machine learning model to identify features having high predictive accuracy for positive or negative task performances.

The relational feature model can represent the contribution of each feature as a value indicative of a positive and/or negative contribution to task performance

Relational feature models describing the performance of different tasks may correlate the same features differently. Using the task of public speaking as an example, video data of a public speaker that appears uncomfortable or stumbles over their words can be used to generate a relational feature model focused on negative performance, while video data of a public speaker that speaks clearly and has body language that suggests they are at ease can be used to generate a relational feature model focused on positive performance. However, using the task of acting as a further example, video data of an individual that appears uncomfortable or stumbles over their words can be used as a model for positive performance for an acting role portraying an individual with public speaking difficulties. The above examples illustrate the advantages of separately weighing the contribution of various features to positive or negative task performance in a task-specific manner and are provided examples are for non-limiting illustrative purposes. In other task examples, the relationship between particular features and task performance can vary according to the requirements of the task.

The adjustment recommendation generated in step 230 provides one or more instructions for actions or adjustments to features relevant to task performance. For example, the adjustment recommendation can include, for example, instructions that the individual use a particular hand gesture, body language component, physical posture, physical motion, or another suitable physical action. As a further example, the adjustment recommendation can include instructions that the individual adjust their cadence, adjust their vocal tone, adjust their vocal pitch, adjust their speaking volume, adjust pronunciation of one or more words, reduce vocal quaver, or perform another adjustment to vocalized sounds produced during task performance. As yet a further example, the adjustment recommendation can include instructions that the individual incorporate one or more words, phrases, or sentences into their spoken language and/or that the individual cease using one or more words, phrases, or sentences into their spoken language. An individual can act on the recommended actions or adjustments to adjust their performance to be more like positive performances used to generate the relational feature model. In at least some examples, the adjustments provided by the adjustment recommendation improve the individual's performances.

Processor 102 can generate the adjustment recommendation by, for example, comparing the features of the first, second, and/or third sets of features with the relational feature model. Processor 102 can identify features present in the video data that are, for example, associated with negative performance and generate a recommendation that the individual not perform those features again in a subsequent iteration of the task. Similarly, processor 102 can identify features associated with positive performance that are not present in the video data and generate a recommendation that the individual perform one of those features in a subsequent iteration of the task.

In some examples, processor 102 can use a trained machine learning model to determine what actions should be included in the recommendation. For example, the machine learning model can be trained on a series of previous recommendations and their effectiveness in improving performance. Advantageously, using a machine learning model trained on the success of prior recommendations can significantly improve the usefulness of the recommendation generated in step 230 of method 200. Processor 102 can use the trained machine learning model to output a limited number of recommendations having a predictive accuracy for task improvement above a certain threshold. Limiting the number of recommendations from a larger pool of recommendations can, in some examples, simplify the recommendation generated in step 230 and thereby improve the ability of the individual to act on the recommendation after it is output in step 234 subsequently.

Additionally and/or alternatively, processor 102 can generate the adjustment recommendation using a simulator. The simulator can use the relational feature model to simulate the effect(s) of the performance of various features on task performance. For example, the simulator can be configured to simulate the effect on task performance of various vocal tone adjustments, body language adjustments, and/or word choice adjustments on task performance and output the predicted effect on task performance of those adjustments as one or more numeric values. The simulator can determine the numeric values representing the effect on performance of one or more features using, for example, the relational feature model. The numeric values output by the simulator can then be used to determine which action or actions will lead to improved task performance. For example, if higher numeric values are associated with improved task performance, the action or combination of actions having the highest numeric value can be selected as the adjustment recommendation.

In some examples, processor 102 can generate the adjustment recommendation using simulation-based optimization. In these examples, processor 102 can use a simulator and an optimizer to determine an optimal action or combination of actions to be included in the adjustment recommendation. The optimizer can use as inputs the outputs of the simulator to determine an optimal action or combination of actions that result in optimal task performance. The optimizer can determine which of the actions has the highest numeric value and that action can be selected as the adjustment recommendation.

In examples where a single action is desired as the adjustment recommendation, the simulator can determine numeric values that represent the effect on performance of each action individually. In some examples where more than one action is desired as the adjustment recommendation, the simulator can simulate the effect on performance of each action individually and the optimizer can determine an optimal combination based on the number of actions desired for the adjustment recommendation. Additionally and/or alternatively, the simulator can simulate the effect on performance of various combinations of actions and the optimizer can determine which combination of actions results in optimal task performance. Notably, while the adjustment recommendation can comprise an unlimited number of actions, it can be advantageous to limit the number of actions included in the adjustment recommendation. As described previously, limiting the number of actions included in the adjustment recommendation can be useful to simplify the recommendation generated in step 230, which in turn can improve the ability of the individual to act on the recommendation after it is output in step 234.

As described previously, processor 102 can be configured to generate the adjustment recommendation using one, multiple or all of the first, second, and third feature sets. The number of feature sets can be based on the task. For example, if the task is lying, the word choice may not be as important as body language and vocal tone. Accordingly, processor 102 can generate the adjustment recommendation based on only the first and second feature sets. In examples where fewer than all three of the first, second, and third feature sets are used to generate the adjustment recommendation, method 200 can omit one or more sets of steps 204-206, 214-216, and 224-226. For example, method 200 can omit steps 224-226 where semantic text features are not used to generate the adjustment recommendation. Similarly, in examples where features from the image data and/or audio data are not used to generate the adjustment recommendation, method 200 can omit steps 204-206 and/or 214-216, respectively.

In step 232, processor 102 generates a performance score. Processor 102 can generate the performance score using the relational feature model and the features of first, second, and/or third feature sets. Step 232 can be performed by programs and/or machine learning models of performance scoring module 150. The feature sets used to generate the performance score can be the same feature sets used to generate the adjustment recommendation. The performance score is an alphanumeric character or string of characters that communicates the effectiveness of the individual's performance. Processor 102 can use a software program to compare features of the relational feature model with features present in the first, second, and/or third feature sets to determine, for example, the number of positive features performed by the individual, the relative contribution of those positive features to effective performance, the number of negative features performed by the individual, and/or the relative contribution of those negative features to effective performance. Processor 102 can use the results of the comparison to generate the performance score in step 232. In some examples, processor 102 can use a computer-implemented machine learning model to generate the performance score. The machine learning model can be trained using combinations of features labeled with their relative contribution to task performance.

The performance score generated in step 232 enables the individual performing the task to understand how well they are performing the task as compared to the positive or ideal examples and/or the negative or unideal examples used to generate the relational feature model. An individual can act on prior knowledge of how to effectively perform the task in response to a lower performance score. Similarly, an individual can perform the task with decreased anxiety or doubt about the effectiveness of their performance in response to a higher performance score.

In step 234, the adjustment recommendation generated in step 230 is output. Similarly, in step 236, the performance score is output. The adjustment recommendation can be output as one or more words or phrases that identify the actions the individual should perform. The performance score can also be output as one or more words or phrases, or as the alphanumeric characters generated in step 232. The adjustment recommendation and/or the performance score are output to user interface 106 or a similar user interface device. For example, each of the adjustment recommendation and the performance score can be displayed to a screen or heads-up device within view of the individual performing the task or can be output as audio data to a speaker, earpiece, headphone, or similar device.

In some examples, method 200 is only used to generate one of the adjustment recommendation or the performance score. In these examples, steps 230/234 or steps 232/236 can be omitted from method 200. For example, the overall performance score may be relatively uninformative, especially for individuals that are not able to act on the performance score using prior knowledge, such as individuals lacking prior experience with the task. Similarly, in some examples, only a performance score may be required to adjust task performance. For example, if an individual is highly familiar with actions that adjust task performance, displaying only the performance score may be sufficient to help the individual ensure that they are performing effectively.

In some examples, method 200 can be performed using video data (i.e., video data acquired in step 202) depicting an entire iteration of a task. In these examples, the adjustment recommendation and/or performance score can be output in steps 232 and/or 234, respectively, after the individual has completed performance of one iteration of the task. The individual can use the adjustment recommendation and/or performance score to adjust subsequent iterations of the task. Method 200 can be repeated for subsequent iterations of the task to provide updated feedback on how well the individual is performing the task.

In other examples, method 200 can be performed using video data (i.e., video data acquired in step 202) depicting only a portion of an iteration of a task. In these examples, the adjustment recommendation and/or performance score can be output in steps 232 and/or 234, respectively, during the same iteration of the task depicted in the video data. The individual can use the adjustment recommendation and/or performance score to adjust the current iteration of the task as well as subsequent iterations of the task. Method 200 can be repeated throughout the same iteration of the task to provide periodic or substantially real-time feedback on how well the individual is performing the task.

As described previously, method 200 can be performed for any task in order to coach individuals based on performed elements of the task. The task can be, for example, lying, acting, public speaking, lecturing, teaching, or another suitable task with performable elements. Method 200 advantageously allows concrete feedback immediately following performance of a task and/or in real time during task performance. Advantageously, method 200 task performance to be measured solely on video data of an individual rather than on biometric measurements or other more invasive measurement techniques.

FIG. 4 is a flow diagram of method 400, which is a method of generating a relational feature model for use with method 200. Method 400 includes steps 402-408 of generating a training data set (step 402), generating features from the training data set (step 404), training a computer-implemented machine learning model with the features (step 406), selecting features having a predictive accuracy above a pre-determined threshold (step 408), and generating the relational feature model (step 410).

In step 402, a training data set is generated. The training data can be, for example, video data or one or more of separated audio data, image data, and semantic text data. The training data depicts individuals performing the task well and/or poorly. The data can depict multiple iterations of the task by multiple individuals to improve the accuracy of subsequent recommendations made using the relational feature model. One or more operators can make subjective decisions about what types of performances constitute good or bad task performances. In some examples, good performances can be identified from publicly-available videos of tasks having view counts, approval rating, comment interaction, and/or another suitable statistic above a certain threshold. For example, videos depicting the task having view counts above a particular threshold and community approval ratings above a certain threshold can be downloaded from a video sharing or social media website and used to create the training data set. As a further example, an individual can view one or more videos of task performance and, based on their prior experience with the task, select videos for the training data set. In some examples, the data stored in the training data set can be labeled to indicate that the depicted performance is a good performance or a bad performance. In other examples, the training data set is selected to only contain one of good performances and bad performances.

In step 404, features are generated from the training data set generated in step 402. The features can be generated in substantially the same way as the first, second, and third feature sets were generated in steps 202-226 of method 200, as described previously. Where the training data set generated in step 402 includes labeled data, the features information generated in step 404 can include the same or substantially the same data labels as those created in step 402.

In step 406, the machine learning model is trained with the feature set generated in step 404. As used herein, “training” a computer-implemented machine learning model refers to any process by which parameters, hyper parameters, weights, and/or any other value related model accuracy are adjusted to improve the fit of the computer-implemented machine learning model to the training data. Training the machine learning model in step 406 improves the accuracy with which the machine learning model is able to predict whether features derived from video data not included in the training data set generated in step 402 (e.g., features derived from data of a test data set) depicts a positive or negative performance of the task. The features used to evaluate the machine learning model can be generated in substantially the same way as described previously with respect to step 406. Step 546 can be repeated multiple times until the predictive accuracy of the machine learning model is above a particular threshold.

In step 408, features having a predictive accuracy above a threshold value are selected for use with subsequent step 510. The predictive accuracy of each feature is determined according to the predictive accuracy of each feature assigned by the machine learning model trained in step 406. The threshold value used to select features in step 408 is determined based on the task being evaluated, the total number of features generated in step 404, and/or another operational parameter to distinguish features positively and/or negatively associated with task performance from features that are not associated with task performance.

In step 410, the relational feature model is generated. The relational feature model is generated based on the features selected in step 408 and the relative contribution of those features to task performance. The relative contribution of each feature to task performance is based on the predictive accuracy associated with the feature and, in some examples, one or more other values. The other values can be derived from the video data and can include, for example, a value describing the perceived importance of the feature, the number of times that the feature appears in the training data, a value describing whether the feature contributes positively or negatively to task performance, or another suitable value. Following step 410, the relational feature model can be used with method 200 for the task.

Notably, method 400 can be repeated for multiple tasks to create multiple relational feature model, with each relational feature model being used to relate features identifiable by steps 202-226 of method 200 with performance for a particular task.

Advantageously, the methods described herein enable task performance for a wide variety of tasks, including communication-related tasks. In particular, method 200 allows task improvement based on pre-existing data of a task without requiring individuals to personally seek out and acquire advice from individuals skilled in the task. Further, especially for communication-related tasks, differing experts can often provide different and, in some instances, conflicting advice, which can make it difficult for individuals to improve task performance. As described with respect to method 400, the relational feature model used by method 200 can be generated using multiple individuals performing the task multiple times, allowing recommendations made using method 200 to reflect the styles or preferences of multiple individuals rather than solely those preferences of a single individual. To this extent, the recommendations made using method 200 are more versatile in applicability to different circumstances than recommendations made using conventional instructional techniques.

Further, as described previously, method 200 enables individuals to having impairments and/or disabilities to sensory organs to use recommendations provided by method 200 to improve task performance rather than conventional feedback mechanisms, such as by visually surveying a crowd.

Claims

1. A method of adjusting task performance, the method comprising:

acquiring video data of a first individual performing a first iteration of the task;

extracting image data and audio data from video data;

extracting semantic text data from the audio data;

analyzing, by a first computer-implemented machine learning model, at least one of the image data, the audio data, and the semantic text data to identify a first set of features;

generating an adjustment recommendation based on the first set of features and a relational feature model, wherein: the relational feature model relates features and task performance; and the adjustment recommendation comprises instructions that can be performed by the first individual to adjust task performance; and

outputting the adjustment recommendation.

2. The method of claim 1, wherein the adjustment recommendation comprises instructions to perform features of the relational feature model.

3. The method of claim 1, wherein:

the adjustment recommendation is generated before the first iteration of the task is complete;

the adjustment recommendation identifies actions that can be performed by the first individual during the first iteration of the task; and

the adjustment recommendation is output to the first individual during the first iteration.

4. The method of claim 1, wherein generating the adjustment recommendation comprises:

simulating the effect of a plurality of actions on task performance; and

determining an optimum combination of actions for improving task performance;

5. The method of claim 4, wherein outputting the adjustment recommendation comprises outputting one or more instructions for performing the optimum combination of actions for improving task performance.

6. The method of claim 5, wherein:

simulating the effect of the plurality of actions on task performance comprises: selecting the plurality of actions based on the first set of features; and determining a numeric value for each of the plurality of actions using a simulator and the relational feature model; and

determining an optimum action for improving task performance comprises determining which of the plurality of actions has the highest numeric value.

7. The method of claim 6, further comprising generating, before acquiring video of the first individual, the relational feature model using a second computer-implemented machine learning model.

8. The method of claim 7, wherein generating the relational feature model comprises:

generating a training set of video data, wherein the labeled training video data depicts preferred task performance;

generating a training set of features from the training set of video data;

training the second computer-implemented machine learning model the training set of features;

selecting features of the training set of features having, as determined by the trained second computer-implemented machine learning model, predictive accuracy above a pre-determined threshold; and

generating the relational feature model based on the selected features and the predictive accuracies of the selected features.

9. The method of claim 6, wherein the adjustment recommendation identifies actions that can be performed during a second iteration of the task to adjust task performance.

10. The method of claim 9, wherein the task is acting, lying, acting, lecturing, public speaking, or teaching.

11. The method of claim 6, wherein the adjustment recommendation comprises a recommended body language adjustment for the first individual.

12. The method of claim 6, wherein the adjustment recommendation comprises a recommended vocal tone adjustment for the first individual.

13. The method of claim 6, wherein the adjustment recommendation comprises one or more recommended words to be spoken by the first individual.

14. The method of claim 1, wherein the image data is analyzed by the first computer-implemented machine learning model, and further comprising analyzing, by a second computer-implemented machine learning model, the audio data to identify a second set of features, wherein the adjustment recommendation is based on the first set of features, the second set of features, and the relational feature model.

15. The method of claim 5, and further comprising analyzing, by a third computer-implemented machine learning model, the semantic text data to identify a third set of features, wherein the adjustment recommendation is based on the first set of features, the second set of features, the third set of features, and the relational feature model.

16. The method of claim 1, wherein the first computer-implemented machine learning model analyzes the image data, and wherein the first computer-implemented machine learning model is a computer vision model.

17. The method of claim 1, wherein the first computer-implemented machine learning model analyzes the audio data, and wherein analyzing the audio data comprises:

converting the audio data to an audio spectrogram; and

analyzing the audio spectrogram to identify the first set of features.

18. The method of claim 1, wherein the first computer-implemented machine learning model analyzes the semantic text data, and wherein the first computer-implemented machine learning model is a natural language understanding model.

19. The method of claim 1, further comprising:

generating a performance score based on the relational feature model and the first set of features, wherein the performance score comprises one or more alphanumeric characters that describe the performance of the first individual; and

outputting the performance score.

20. A system for adjusting task performance, the system comprising:

processor;

a user interface; and

a memory encoded with instructions that, when executed, cause the processor to: acquire video data of a first individual performing a first iteration of the task; extract image data and audio data from video data; extract semantic text data from the audio data; analyze, by a first computer-implemented machine learning model, at least one of the image data, the audio data, and the semantic text data to identify a first set of features; generate an adjustment recommendation based on the first set of features and a relational feature model, wherein: the relational feature model relates features and task performance; and the adjustment recommendation comprises instructions that can be performed by the first individual to adjust task performance; and cause the user interface to output the adjustment recommendation.