SURVEY SYSTEM WITH MIXED RESPONSE MEDIUM
A system is configured to provide a survey interface that collects response data, including both quantitative and qualitative response data, using multiple capture mediums. Mediums used to capture response data include input forms that collect structured response data on particular questions, as well as multimedia input forms that capture and collect free form multimedia response data in video form. This mix of quantitative and qualitative response data is analyzed across multiple modalities and used to develop an indexed response dataset, which may be queried to determine a set of pre-configured insights. An insight interface visualizes these pre-configured insights and accepts additional queries to provide a query interface that draws from the static indexed response dataset to allow for dynamic, conversational querying for additional insights.
This application claims priority to U.S. Non-Provisional patent Ser. No. 17/575,116, filed Jan. 13, 2022, titled Survey System with Mixed Response Medium, issued as U.S. Pat. No. 11,514,464 on Nov. 29, 2022, which itself claims priority to U.S. Provisional Patent No. 63/137,828, filed Jan. 15, 2021, titled Survey System with Mixed Response Medium, the entirety of each of which is incorporated by reference herein.
FIELDThe disclosed technology pertains to a system for collecting and viewing survey response data.
BACKGROUNDThe collection and use of feedback, whether from customers or users of a product or software, supporters of a cause, employees of a company, or other various sources of critique or compliment is an important aspect of many ventures. Gathered information and underlying insights may be used to improve products, identify activities or positions that cause a favorable or unfavorable response from the general public, or determine future strategies for software development or growth, among numerous other uses. Unsurprisingly, receiving high-quality feedback data contributes to the development of high quality insights. High quality insight might include feedback and question responses that are clear and consistent (e.g., “most customers love our new shoe”), but may also include less obvious insights that may be determined from the same or similar dataset (e.g., “customers in colder climates hate our new shoe because it is not water resistant”).
Customer feedback mechanisms such as evaluations, surveys, and online analytics can produce limited data and insights due to their pre-generated, rigid nature. As an example, a survey question collecting structured data associated with pre-defined questions, such as “Rate our shoe with a score from 1 to 10” can provide very clear results, but those results will be limited both by the questions asked, as well as the limited and structured format of the responses (e.g., a user can only provide a numeric rating between 1 and 10). Thus, in many cases such feedback mechanisms are unable to gather data usable to identify less obvious insights, and instead primarily provide feedback that is expressly related to the question asked. Despite these limitations, these customer feedback mechanisms are still popular due to the scale at which they can be conducted and relatively low cost.
A conventional approach to gathering less structured feedback might include usability tests, focus groups, and interviews to allow for more open, free-form feedback that may also be interactive. This might include unguided discussion on a particular product or service where the respondent is entirely responsible for the flow of discussion, but may also include guided discussions where an interviewer may direct attention to certain topics, or ask dynamic follow up questions in response to prior discussion. A major drawback to these methods of feedback and data acquisition methods is the significant expense involved both in gathering the data, as well as interpreting the data. This is because, unlike structured data, which is easily programmatically interpreted, unstructured responses are typically manually considered, relying upon a reviewers subjective experience and approach, in order to develop insights.
What is needed, therefore, is an improved system for producing high quality feedback across mixed response mediums.
The drawings and detailed description that follow are intended to be merely illustrative and are not intended to limit the scope of the invention as contemplated by the inventors.
The inventors have conceived of novel technology that, for the purpose of illustration, is disclosed herein as applied in the context of feedback and survey systems. While the disclosed applications of the inventors' technology satisfy a long-felt but unmet need in the art of feedback and survey systems, it should be understood that the inventors' technology is not limited to being implemented in the precise manners set forth herein, but could be implemented in other manners without undue experimentation by those of ordinary skill in the art in light of this disclosure. Accordingly, the examples set forth herein should be understood as being illustrative only, and should not be treated as limiting.
Implementations of the disclosed system combine qualitative and quantitative responses to questions to allow for certain pre-configured insights to be automatically determined, to allow for conversational style dynamic querying for insights, or both. Providing response data in the form of pre-configured, or “top level” insights, and in the form of “follow up” questions as part of conversational querying, allows for qualitative response data in various mediums (e.g., video, audio, free form text) to be utilized in a scalable and meaningful way.
For example, an online video survey that provides qualitative response data in the form of a video (e.g., images and audio) is desirable since free form video feedback will include information that is broader, and deeper when compared to quantitative, structured data responses. However, review of video response data from an online survey is not easily scalable, and would generally require at least a 1 to 1 ratio of content to review (e.g., 100 hours of video may generally require at least 100 hours of manual review). Thus, a disadvantage of online video feedback becomes the curse of too much data, with some large datasets (e.g., hundreds to thousands of hours of video) being easily gathered, but far more difficult to review, making human assessment impracticable.
By combining qualitative and quantitative response data with each other, and with other information that can be extracted from mixed response mediums, a system may be implemented that can identify meaningful top level insights within large datasets, can allow for conversational style querying of large datasets, or both. Implementations of the disclosed technology may utilize complex distributed computing systems, software architectures, unique user interfaces, machine learning techniques, and other features to quickly extract and analyze non-obvious, insightful data from large datasets including mixed medium response data.
Turning now to the figures,
The system may then analyze (104) the mixed medium response data to identify any potentially meaningful data contained in the mixed response mediums, and to begin to inter-associate portions of the response data based upon relationships such as semantic similarity or relatedness, temporal relationships, sentiment relationships, or other relationships. Some implementations of the system may use a multi-modal data synthesis approach to analyze and mine large mixed medium quantitative datasets at various levels of granularity to produce these connections. As an example, video response data may be split into multiple modalities (e.g., audio data, image data, text transcript, video metadata) that are analyzed separately and in a multi-modal manner to extract unique data features. Extracted data features may be combined with quantitative response data and other response data, such as demographic data, physiological data from wearable devices, or survey form session context data, for example, to produce additional connections and relationships between response data, and to create the potential for identifying deeper and less obvious insights based on the response data.
The response dataset may then be queried at a high level based upon pre-configured insight queries (topic specific, sentiment specific, demographic specific, time allotment specific, etc.) to determine (106) a set of top level insights represented within the response data. Top level insights may be provided (108) to a user of the system via an insight interface which may include textual and numerical descriptions of insights as well as complex interactive visualizations of the insights. Some implementations of interfaces provided by the system may also include providing (110) a conversational query interface that allows additional, dynamic interactions with the response dataset to search for non-obvious insights or otherwise develop information about insights not represented in the top level insights.
The conversational query aspect of the system is especially advantageous for large datasets of qualitative response data, as it allows for response data to be explored at a high level of granularity based upon prior insights by precisely drawing from the qualitative response data, rather than requiring users to manually review substantial portions of the qualitative response data. With conversational querying of the response dataset, users can explore extracted data features iteratively by using insights previously produced by the framework to generate new queries or refine areas of the data to explore. This may be performed cyclically by leveraging previously derived data features along with new user supplied data directed by insightful visualizations (e.g., such as a diagram identifying commonly used nouns that, when clicked on, query the response dataset for additional insights related to the clicked noun, such as video clips or transcript text where the noun is mentioned).
Users can start with the basic unidirectional analysis (e.g., top level insight analysis), and then can ask new questions based on the results of the unidirectional analysis. New questions are, in effect, asked against the previously generated data features to perform subsequent analysis tasks based on the previously unknown insights. As an example of the above, Once data features are initially extracted from qualitative response data, such as video clips, a first pass may be made on the features to produce an initial insight based on the original objective of the video survey (e.g., such as by executing pre-configured queries against the indexed response dataset to determine top level insights). Visualizations may be produced that provide non-obvious insights concerning the original objectives, and which may be further explored when they are determined to be high value insights for which more information is desired. After a user selects an interesting new avenue of insight exploration, the new query or follow up question may be executed against the indexed response dataset to perform a new analysis focused on the selected subset of the data features, with the results being updated to the insight interface as new or updated visualizations and new possible avenues or directions of insights to explore.
As an example of the above, system may analyze (104) a response dataset provided in response to a quantitative prompt to rate how likely you are to host a party at your home with a score between 1-10, and a qualitative prompt to record a short video discussing your thoughts on hosting a party at your home. One pre-configured insight is to determine, within the qualitative response data, an aspect of hosting a party that is associated with very positive sentiment, and an aspect of hosting a party that is associated with very negative sentiment. The system may execute this pre-configured query on the response dataset and determine that “friend” has a very high positive sentiment across all quantitative responses (e.g., whether a respondent rated their desire to host a party at 1 or 10, all spoke positively about a desire to be with friends in their qualitative response), while “pet” had a very negative sentiment across quantitative responses that indicate a low desire to host a party (e.g., those who rates their desire between 1 and 4 spoke negatively about “pets”), perhaps suggesting that some respondents would be more likely to host a party if some issue relating to a “pet” could be resolved.
Continuing the example, this information could be presented to a user of the system via an insight interface (108) that describes various characteristics of the response data (e.g., number of respondents, questions asked, forms of responses, etc.), descriptions and/or visualizations of the already determined (106) insights, viewable portions of qualitative video data that is related to the determined (106) insights (e.g., audio clips, images, video clips selected from the qualitative response data where respondents were discussing “friend” or “pet”). The system may also provide a conversational query feature (110) that allows the user to query or “ask” the system to provide further insights related to “friend” or “pet”. In effect, this simulates a conversational aspect to interacting with the response data, as the response data itself has already been captured and is statically stored, but additional queries may be provided in the form of “follow up” questions related to prior determined insights to gain additional insights from the large response dataset.
Continuing the example, after viewing several short video clips selected from the qualitative data where respondents discuss “pet”, the user may use the provided (110) interface to ask about “pet”, which may include typing in a free form question, selecting from an automatically generated list of questions, or selecting an interface element that is part of a visualization or other representation of the “pet” insight (e.g., clicking on the word “pet” in a word cloud, or selecting it from a scatter chart or other diagram). Upon selecting “pet” as a follow up question, the system may execute another set of queries against the response dataset using “pet” as a topic of interest to identify frequently associated topics, positive sentiment topics, and negative sentiment topics that are related to “pet”. The results of this query may be provided (108) via the insight interface, and may indicate, for example, that negative sentiment about pets is most commonly associated with “cat” and “dog” which, after subsequent follow up questions, themselves are commonly associated with “box” and “walk” (e.g., indicating that many respondents feel negatively about hosting a party due to a need to maintain a cat box in their home, or a need to walk a dog frequently), with each update to the insight interface being coupled with additional information and video clips related to the follow up questions (e.g., a montage of 5-10 second video clips where respondents negatively discuss pets).
A unique value of the above example and general approach is that a high quality, valuable insight may be drawn from a large amount of qualitative data, after the qualitative data has already been statically determined. In other words, the qualitative response data is treated as an ongoing conversation where follow up questions may be asked to dynamically provide new insights, without the need for capturing new or additional qualitative data, and without the need for manual review.
Platform interfaces (202) may include APIs, web services, websites, software applications, or other communication interfaces by which surveys may be provided to respondents, and response data may be provided to the server (200). User devices in communication with the platform interfaces (202) may include, for example, mobile devices (204) (e.g., smartphones, tablets, other handheld computing devices), computers (206) (e.g., desktops, laptops, other computing devices), and wearable devices (208) (e.g., fitness trackers, smartwatches). User devices may receive surveys and provide response data, or may provide additional types of response data, such as in the case of wearable devices (208) which may provide, for example, heart rate measurements, exercise or activity information, or other information which may provide additional opportunities for insight identification.
The interface (300) advantageously includes both quantitative (302) and qualitative (304) sections in a single form, session, or transaction, as opposed to prompting for such responses at separate times, or during separate stages of an ongoing survey. In this manner, the different response types and medium types are readily related to each other during analysis (104) of the response inputs, as compared to response that may be separately prompted for, or provided, which may result in incomplete submissions (e.g., either quantitative, or qualitative response portions are skipped), or submissions that are less closely related by the respondent themselves (e.g., after the passage of time, loading of new interfaces, or other actions where prompts are separately provided, the respondent's state of mind relating to the prompts may have changed).
Moving from the circular indicator “A” in
An interface (340) may be provided where quantitative response data indicates that “Cost” is the most important characteristic to the respondent, and the interface (350) may be provided where quantitative response data indicates that “Brand” is the least important characteristic to the respondent. Response data from the interface (340) may be used to lend credibility to the respondent's prior response data on “Cost” being important, as the quantitative response may indicate that the cost of shoes being low, average, or high is important, while the qualitative video response may indicate how closely the cost correlates to satisfaction. Response data from the interface (350) may be used to lend credibility to the respondent's prior response data on “Brand” being unimportant, with the quantitative response confirming whether it is truly unimportant, or just less important than other quantitative responses, and the qualitative response data being usable to lend further credibility to the response data (e.g., a submitted picture shows several pairs of shoes that image analysis indicates are of different brands, or the same brand) or to provide other insights related to the respondent (e.g., several pairs of shoes that image analysis indicates are running shoes may suggest an athlete demographic).
A sequence of survey interfaces such as the above may continue until all statically configured and/or dynamically generated survey interfaces have been provided to the respondent, or may continue indefinitely with the server (200) determining, upon each submission, some aspect of the prior quantitative response data, qualitative response data, or both to further investigate for the purposes of providing rich data for insight determination.
A qualitative prompt may also be configured (402), which may include providing text, images, or other materials providing a question or instruction for providing qualitative response data in one or more mediums. One or more platforms that the survey is intended for may also be configured (404), which may include specifying user devices, operating systems, software applications, web services or websites, or other platform interfaces (202) by which the survey may be conducted. Quantitative input types may also be configured (406), which may depend in part upon the configured platforms (404) to determine which mediums are available for qualitative response input. For example, where the survey is configured for mobile devices (204) such as smartphones, qualitative input types may utilize built in cameras, microphones, or other sensors in order to capture qualitative response data in one or more mediums. Where the survey is configured to be conducted by website, agnostic to particular devices or device capabilities, qualitative input types may instead be limited to free form text or other device agnostic input medium.
A set of survey form data may be generated (408), which may include generating a software application, electronic message content, software application module, web location, or other dataset that may be executed or accessed by respondent user devices in order to view and participate in the survey. As an example, where a survey may be delivered via email, website, or hybrid mobile application, generating (408) the survey form dataset may include generating HTML content that will display question prompts and input options when interpreted by respondent devices.
Qualitative response data may then be received (416) via the survey interface, in one or more mediums, as may be configured for a particular survey. The survey interface may allow respondents to review or replace qualitative response data, such as by playing a preview of captured video comments prior to submission. As with prior response data, received (416) qualitative response data may be submitted to the server (200) based upon the respondent clicking a submit button or taking some other action via the survey form, or may be captured asynchronously as it is received at the survey interface (e.g., video or audio capture may be streamed to the server (200) as it is captured).
Survey interface context may also be received (418) by the server (200). This contextual data may include device and application level variables, information, or other characteristics that describe a particular respondents interactions with the survey interface. As an example, this may include website session context, user activity tracking information (e.g., mouse locations, click locations, touch locations, and other form interactions, which may be tracked by a software application or script), time spent taking the survey, time between receiving the survey and opening or completing the survey, and other information. Such context information may be useful, as it may indicate the order in which questions were answered (e.g., did the respondent answer the quantitative, or qualitative prompt first?), whether the user changed any responses prior to submitting (e.g., revising a quantitative response after capturing a qualitative response), and other user activity on the survey form.
As an example of this multi-modality approach, videos are a data dense media that can contain useful information but are mixed with spurious data that can hinder the extraction of meaningful insights. Spurious data could include background noise, video frames where the subject is off screen, and verbal utterances such as “um”, “uh”, or other verbal or non-verbal pauses. Some of the spurious elements can be removed or mitigated by decomposing the video data into multiple modalities: still images, audio, and text transcripts. However, the richness of the data can be lost when examining only one component of video responses. The analysis addresses this risk by decomposing the video into various modalities which can be analyzed independently and then combined to produce multi-modal insights. While additional examples are provided below, one example of a multi-modal analysis would be combining audio and textual sentiment with facial expressions derived from video frames to detect sarcasm and other inconsistent responses. In such a case, natural language processing or other text analysis may determine an incorrect sentiment because sarcasm is often lost in text-only responses.
Referring to
Received audio input (424), as well as audio track data from a received video input (422) may undergo audio analysis (434) to identify characteristics of the audio track such as voice tone, emotion (e.g., enthusiasm, sadness, disinterest), presence of music or other audio sources in capture environment, or other information, and also to produce text transcription. Text transcription, as well as free form text inputs (426) may undergo text analysis (436) to identify the underlying meaning of the text, and semantic relationships and content, for example.
General sentiment analysis may also be performed (440) based upon raw response data, as well as upon the results of various other analyses. For example, this may include using quantitative input (444), session context input (442), sensor input (438) (e.g., heart rate or other physiological indicators that may suggest a certain emotion or frame of mind), audio analysis (434), text analysis (436), image subject analysis (428), image background analysis (430), and other response data to determine the respondent's sentiment as it relates to the qualitative response data (e.g., positive sentiment, negative sentiment, undecided, etc.). The results of sentiment analysis (440), as well as other inputs and analysis results described in the context of
As an example using the timeline in Table 1, the indexed response dataset may be queried by time to determine what response data and analysis is associated with a particular time, such as at the 65 second mark, while a video response was being captured. Such a query might return one or more words of a transcript of spoken words from the video response at that time, as well as image analysis (428, 430) of captured video frames at that time, audio analysis (434) of captured audio tracks at that time, text analysis (436) of the transcript of spoken words, sentiment analysis (440) based on a variety of response data and other analysis at that time, and session context (442) for how the respondent was interacting with the survey interface tat that time. As further example, the indexed response dataset may be queried based upon parameters other than timeline position, such as a word query (e.g., a query for “pet” may return any moments on the survey or video timeline related to “pet”, as well as any image analysis, audio analysis, transcript analysis, or other analysis related to “pet”), image query (e.g., a query image may return similar images, images with similar objects, moments in a video timeline with similar images), sentiment query (e.g., a query for positive sentiment may return moments in a survey timeline or video timeline where analysis identified positive sentiment, or images, audio tracks, transcript text, or other text associated with positive sentiment), or other query types.
While the indexed response dataset has been described as a data structure that may be “queried”, it should be understood that it may be implemented in varying forms. In some implementations, the indexed response dataset may be stored as records in a relational database, while in other implementations the indexed response dataset may be stored as a set of object oriented programming data structures (e.g., an object of type “indexedResponseDataset” may include its own sub-objects, collections, and variables, as well as its own function calls to allow “querying” of the data structure as has been described). Other possible implementations of the indexed response dataset exist and will be apparent to those of ordinary skill in the art in light of this disclosure.
Additional examples and aspects of video modality analysis exist beyond those described above. For example, a key component in video analysis is the time ordering of the data, as has been described. By preserving timestamps or intervals for objects in other modalities (video frames, audio clips, spoken words or sentences), time series representations of the data can be preserved and interrelated. This allows various data features to be extracted from the sequenced video frames that could not be obtained from still images only, such as automatic inventory of items displayed or presented during the video, or the use of the time sequence of events within the video to tell a story or explain a consumer journey such as the steps taken during a buying process. A further advantage is that still image analysis, without the context provided by a sequence of frames, can generate spurious results by interpreting transient events as significant. For example, still images of people speaking often show funny facial expressions that are often incorrectly interpreted when analyzing sentiment or other characteristics. Video analysis allows for multiple frames to be analyzed together, within the same context, to remove spurious results. As another example, analyzing still images for a frown or a smile may produce erroneous results since a smile does not instantly appear but appears over a series of frames. Dedicated video analysis within a multi-modality analysis approach allows the detection of genuine facial expressions while filtering spurious results.
As further example of image analysis, images (e.g., one or more still images, or video decomposed into a sequence of independent still images) received by the system may be analyzed in various ways. Data features extracted from each still image may include objects recognized within the frame such as a shoe, soda can, person, or other object. This allows for video surveys where the respondent can visually show their response to a question such as “what is inside your refrigerator?” Other data generated by image analysis may include brands recognized within the frame, which may allow for video surveys where the brands of the respondent's shoes can be detected if they were asked to show their shoe collection, for example. Other image analysis may include facial expression detection, to allow for facial sentiment analysis and the detection of perceived interest or enthusiasm, physical features of the respondents such as hair color, hair texture, type of haircut, skin complexion, makeup usage, eye color, perceived age, and other characteristics, which allows for subgrouping respondents for analysis without explicitly asking for such information. For example, a hair care product might be positively viewed by people with curly hair while negatively viewed by people with straight hair.
Data features extracted from still images can be combined to provide deeper insights such as combining detected facial expressions with an object presented to extract sentiment towards the object. Further, due to the plug-in nature of the machine learning models and the multi-modal analysis framework, customized models can be developed and used. For example, a drop-in, product-space specific machine learning model may be configured to identify significant objects in still-frames extracted from the video and filter spurious objects (e.g., rather than trying to recognize every object depicted in an image during a survey related to shoes, a shoe specific, trained object recognition model may be executed based upon a pre-configured topic of focus for the survey, or based upon a prevalence of a topic exhibited by the respondent's inputs, while separate models trained to identify cars, product logos, major home appliances or other objects are available to the system but not executed).
As further example of audio analysis with a multi-modal analysis framework, audio tracks (e.g., from a video, or direct audio recording) may be analyzed independently to extract data features that are recombined with video features for multi-modal analysis. Example data features and analysis tasks may include audio emotion analysis, detection of enthusiasm or apathy based on vocal inflections, detection of nervousness or confidence in voice, and accent detection. Audio analysis can leverage the power of video surveys by using questions that require multi-media responses. For example, prompting the respondent to play their favorite workout music while doing an exercise as part of their response. The audio properties of the music played can be used to partition respondents into subgroups based on preferences without directly asking for their musical preferences. As has been described, another significant use of audio response data is to produce a text transcript of speech through automated audio analysis and transcript generation.
As further example of text analysis, converting speech into text transcripts allows the use of the richness of language to extract a significant amount of information from a video. For example, natural language processing (NLP) of text transcripts preserves the connection between words and sentences with the timestamps they were spoken allows the combined synthesis of visual, audio, and textual data.
As further example of demographic analysis, the ability of video surveys to collect quantitative and demographic data as well as video data allows the user to move between levels of granularity in the obtained data. Including all respondents in the analysis gives an overview of the results but can lose sight of some finer details. Having the ability to obtain direct data for differentiating subgroups as well as determining implicit partitions of the data allows for both hypothesis testing as well as exploratory data analysis.
As another example of a multi-modality analysis, consider an example where response data is being provided related to hair care products and different types of hair. Individual frames from the video are analyzed to determine the hair type of the respondent, which is then used to augment the text transcript derived from the audio. This augmented data can be used to perform a subgroup analysis of the results from natural language processing of the transcripts. Additional data modalities can be used to further augment the data used in the main analysis. In the survey, respondents were not asked about their hair type during the video survey, but the survey provider was interested if there were differences in the responses based on the respondent's hair type. The main data analysis may be performed using NLP on text transcripts derived from the audio track of response data. Still images extracted from the video may be used to identify the various hair types of the respondents, and such analysis may be merged with the transcripts. Based upon the analysis, the survey provider is now able to examine the differences between the entire respondent population and subgroups based on their hair type. In this example, the use of additional data modalities extracted from the same videos as the main data allows the user to explore the dataset in new ways without having to redesign the survey or provide a new survey.
As further example of sentiment analysis, a set of video transcripts may be parsed to extract parts of speech and word dependency relationships. Noun phrases are extracted from the parsed transcripts, and sentiment analysis is performed on the various sentences containing each noun phrase. Each noun phrase is clustered semantically to group words with similar meanings. Statistical analysis is performed on the sentiment scores for each cluster to provide the user with the overall sentiment within the selected video for each noun-phrase cluster. The results of the statistical analysis is provided to the user through various visualizations that provide interactivity to allow the user to examine the overall results as well as sentence-level results. Based upon dependency relationships, the user can explore the relationship between various noun phrases that are viewed positively, negatively, or neutrally using relation graphs. The user can select various noun phrases to jump to the timestamp in the corresponding videos where the sentence was spoken as well as generating vignettes that combine each utterance of a noun phrase. The resulting positive, negative, and neutral sentences can be combined to form new datasets for further analysis using different methodologies.
Depending on the pre-configured top level inquiries to the response dataset, the server (200) will then execute one or more queries against a response dataset, such as a previously produced (446) indexed response dataset. Continuing the above example, this may include querying (456) for topic related insights (e.g., “what did response data indicate about pets?”), demographic related insights (460) (e.g., “what did response data indicate about respondents that live in major cities”), sentiment related insights (464) (e.g., “which topics that were discussed in qualitative response data were most positive, and most negative”), or sentiment conflict related insights (e.g., “where does quantitative response data deviate from qualitative response data”). Results of such queries may be received (470) as result sets, objects, or other data sets or collections that describe the related top level insights, and may include for example, quantitative responses, portions of qualitative responses (e.g., data representative of a qualitative response, or usable to identify or access a qualitative response), other related inputs (e.g., session context input or sensor input), and any analysis (e.g., results of image, audio, transcript, or sentiment analysis) that relates to the top level insights.
As further example, the received (470) result in response to “what did response data indicate about pets” might identify portions of transcripts where respondents talked about topics related to pets, or might identify audio, video, or images captured while respondents talked about topics related to pets, and sentiment determinations related to a respondents discussion on pets. The server (200) may also determine portions of any associated (472) qualitative response data that is related to the top level insight result set, which may include retrieving, accessing, providing access, or otherwise preparing for presentation audio, video, or still images related to the top level insights.
Further examples and aspects of visualizations include a radial dendrogram that shows hierarchical data. When finding answers using natural language processing, the responses often use distinct words that have similar meanings. Being able to cluster distinct responses semantically helps reduce the noise in the data and gives more concise results. Concise results can sometimes hide important information, so being able to change from a coarse-grained, concise set of answers to the fine-grain set of actual responses is helpful. This can be done in an interactive way by using visualizations that transition between coarse and fine-grained results. At each level, the user can also find the specific clips where the respondents stated their answers or generate a montage video of all the answers within that cluster. The visualization also allows for the selection of a set of videos based on the answers for further analysis. For example, the user could select all the respondents who stated their favorite thing is their cat and then display the answers to a follow-up question of why their cat is their favorite. The ability to change the level of detail in the synthesis results, query the original video clip that produced the specific result, and subgroup data based on the synthesis results for further analysis using the visualization is a powerful tool for exploring large datasets.
Returning to
The server (200) may also determine (486) one or more qualitative response vignettes by identifying sections of audio or video qualitative response data that are related to insights, such as identified by the associated (472) qualitative response data, and may select these vignettes as separate portions of qualitative response data (e.g., multiple separate short video clips), or combine them together into a single portion of qualitative response data (e.g., an aggregate video clip containing multiple short video clips). The server (200) may then provide (488) an insight interface to a user that describes determined insights, and that may also include visualizations (480) of insights, summaries (482, 484) of insights, and qualitative response vignettes (486) or samples.
Other features of the insight interface may include automatic highlighting and presentation of transcript text based upon configured keywords, or based upon a user's past use of a manual highlighting tool (e.g., the user always highlights sentences that contain a variation of “love”), or based upon top level insight determinations and/or follow up questions (e.g., a configured top level insight related to the topic “shoe” may cause each sentence that contains that word to be highlighted). Portions of qualitative response data that are manually reviewed via the insight interface, such as by viewing a vignette, reviewing a highlighted transcript, or by asking a follow up question such as “show me all transcript data related to the top level insight “Pets”, may be marked within the interface as having been manually reviewed in order to further reduce the time spent on manual review by preventing duplicate review of the already filtered qualitative response data.
When a conversational query is received (506), the server (200) may query (508) the indexed response dataset and receive (510) a follow up insight result set, which may contain additional or different response data and insights than the initial top level insight result set. The insight interface may then be updated to add (512) additional visualizations based on the follow up question, to add (514) additional insight summaries based on the follow up question, and to add (516) additional qualitative response vignettes based on the follow up question. In some implementations the additional (512, 514, 516) insight data may be presented instead of previously displayed data (e.g., a new page may load), while in some implementations such additional insight may be added or built into the already displayed insight interface.
As an example,
While controls for vignette viewing may be provided based upon top level insights or configured topics of interest, as has been described, the interface may also allow for more free form vignette selection, compilation, and viewing based upon the indexed response dataset. For example, a user may query the indexed response dataset (e.g., by constructing a plaintext query, or by interacting with a query builder tool or interface) to see vignettes of video clips that meet criteria such as “positive sentiment about Object A”, “negative sentiment about Object B”, “respondent wearing red clothing”, “respondent's that discussed sustainability”, and other queries. A query builder interface that provides query options based upon the indexed response dataset may be advantageous for such free form queries, as it may provide a selection that allows a user to view a vignette of respondent's wearing red only when the indexed response data actually contains image analysis suggesting that some significant number of respondent's wore red (e.g., the interface may prevent a query for vignettes of “respondent wearing green clothing” where no respondent wore green clothing while recording qualitative video response data).
At each state of the interface (e.g., illustrated in
Other advantages and features of conversational querying or analysis exist. For example, a common component of surveys is asking open-ended questions that allow the respondent to reply in an unstructured way. While this survey approach can provide more detailed information when compared with multiple choice or quantitative questions, it requires more effort to extract meaningful insights, and is often not feasible using manual review. The conversational analysis approach allows the user to interrogate a set of videos using either the questions in the original video survey or other useful questions derived when analyzing the videos.
With this approach, users can interrogate a set of videos by asking questions after the creation of the video survey response data. The framework provides users the groupings of semantically-similar answers along with measures of the quality of the analysis and the video and timestamp of each answer. A user may select a pre-processed set of videos and provide one or more questions to be answered. The conversational analysis service then determines the best answers to the questions within each video as well as measures of how accurately it believes it found the answers. The answers are then clustered together in a semantically-similar manner where different words and phrases with similar meanings are grouped together. This reduces the amount of unique answers and provides a more human-like result. The clusters can then be examined more closely to obtain the exact answers spoken in the video. The clustering can also be used to generate a vignette of video clips that provide the user with the person in the video speaking their answer to the given question. The resulting answer data features are stored for future analysis. For example, the sentences containing the answer for a question from each video can be combined with quantitative or demographic data, summarized, or analyzed for sentiment.
After determining (520) a first set of top level insights, the server (200) may determine (522) one or more closely related topics that may be asked about based upon a pre-configured set of follow up queries (e.g., “ask about the most commonly used noun”, “ask about cats if that is a commonly used noun”, “ask about the noun associated with the most positive sentiment”). As an example with reference to
Where a configured follow up question exists and can be automatically determined while the respondent is still engaged with the survey interface, the server (200) may automatically generate a corresponding follow up question and update (536) the survey to include the new quantitative and/or qualitative prompt. Continuing the above example, the server (200) may identify a topical (524) follow up question due to the discussion of Pets identified in the top level insights, and may create (526) a topical follow up question set such as “Rate how happy cats make you between 1 and 10, and then record a short video letting us know what you think about cats.” The server (200) may also identify a positive sentiment (528) related follow up question due to the high positive sentiment associated with “Friends”, and may create (530) a positive sentiment follow up question set such as “We get the feeling that friends make you happy, let us know with a ‘Yes’ or ‘No’ whether you plan to spend time with friends in the next ten days, and record a short video about your plans.” The server (200) may also identify a negative sentiment (532) related follow up question due to the high negative sentiment associated with “Insects”, and may create (534) a negative sentiment follow up question set such as “We see that you like summer, but that you're not a big fan of insects. Let us know on a scale of 1 to 5 how likely insects are to affect your summer plans, and record a short video explaining why.”
As each follow up question is created (526, 530, 534), the server (200) may update (536) the survey interface to reflect the new quantitative and/or qualitative question prompts, ideally while the respondent is still engaged with a prior prompt or other aspect of the survey interface. Updating (536) the survey interface will depend upon a particular implantation and platform, but may include, for example, appending additional sections onto an HTML page or within a software application interface, or providing a button to proceed to a next page of the survey instead of a button used to complete or submit the survey via an interface. In some implementations, the update (536) to the survey interface will be performed seamlessly, such that the follow up question may be asked in near real time based upon already provided response data, such that the respondent is still engaged with the survey interface, and their previously provided response data is still fresh in their mind.
As further example of real time conversational surveys, conversational video surveys can be tailored to the aspects of the respondent's partial response data. This is possible by leveraging the multiple data sources that video provides (image, audio, and text), which can be analyzed in parallel to provide varying data characteristics. In this manner, partial response data received from the respondent may be interpreted to adapt or add survey questions automatically. Some examples that would be unique to this approach include responding, with new or different questions, to keywords or phrases spoken by the respondent and identified through NLP, using cues from facial emotions (e.g., from image or video analysis), voice emotion (e.g., from audio analysis), and text sentiment to identify concerns that were not known beforehand to create immediate, same session, follow up questions, identifying objects in the video (e.g., by image analysis) and providing immediate questions concerning those objects (e.g., if an apparel brand is detected from a logo the respondent could receive a question relevant to that brand), or suggesting the use of other features, such as augmented reality, based on cues from the respondent (e.g., where a transcript indicates the user is curious what a shoe would look like when worn, the system may suggest using augmented reality to simulate that scenario and provide a button or other interface to activate the augmented reality view).
In some implementations, an insight interface may allow users to perform conversational queries against response data with various time-based search limitations. For example, where the indexed response dataset includes responses from surveys that have been provided to a common respondent group multiple times (e.g., once per month), the dataset may be conversationally queried at its most general level, which would treat all response data as “timeless”, in other words, response data from a single respondent that has been collected once per month over a six month period would be treated as an aggregate rather than separate instances of response data.
In the above instance, conversational queries could also be given time parameters to limit the response data that is queried in varying ways, where the user does not wish the response data to be treated as an aggregate. An example of a time parameter may be to query the response data based upon its age, which would provide insights based upon historic snapshots of the response data. For example, this could include a query for information such as “positive sentiment for pets one month ago” or “positive sentiment for pets six months ago”. Such a query could filter the indexed response dataset to only portions that are based on response data that was received one month ago, or six months ago, respectively. Another example of a time parameter is to query the response data based upon a respondent's relative “age” within the response data, rather than the strict age of the data. As an example, imagine a first respondent who has responded to surveys related to a particular cat food once per month over a period of six months, and a second respondent who has responded to surveys related to the particular cat food once per month over a period of three months. A query based on a relative “age” of three months would query the indexed response dataset based upon a respondents third month of response data, regardless of when the response occurred. The resulting dataset would describe insights for any respondent that had spent three months with the particular cat food, regardless of when those three months occurred.
While the discussion of
For example, a survey relating to a shoe may not be pre-configured to focus on the shoelaces, but where responses from one or more respondents show positive or negative sentiment that exceeds the general sentiment towards the shoe by more than a configured threshold, it is advantageous to ask a dynamic follow up question in real time, while the respondent is still engaged, in order to explore the unexpected sentiment.
Continuing the above example, a pre-configured question might ask a respondent to rate different aspects of the shoe on a scale of 1-10, and the respondent's input might average out to 5.5, while their score for the shoelaces is a 9 (e.g., the same example could apply to video or audio responses from the respondent, as well as other response mediums, as described in the context of
In some implementations, the system may be configured to provide an additional interim question or prompt designed to occupy the respondent while the system, in parallel, identifies related insights and creates dynamic questions. As an example, this may include, immediately after receiving the respondent's numerical ratings of aspects of the shoe, providing a prompt for the respondent to provide audio or video feedback for 30 seconds, completing creation of the dynamic questions in parallel with the respondent providing the 30 second feedback, and then providing the now prepared dynamic questions immediately after the 30 second feedback is completed. Interactive games, riddles, or other compelling interactions may be provided to the respondent instead of or in addition to audio/video prompts (e.g., such as a clickable logic game or puzzle being displayed along with the message “Thanks for that last answer, we love it! As a reward, try to solve this logic game—only 48% of our respondents are able to complete it!”)
Such approaches are advantageous over conventional survey interfaces because they are able to ask follow up questions dynamically while the respondent is still engaged, and are able to maintain the respondent's attention during periods of time required to identify and generate follow up questions in order to maintain the real-time nature of the dynamic questioning even where the time required to generate the questions exceeds the short window in which the respondent expects a new survey interface to be loaded (e.g., such as the 0.5 to 2 second window in which a new web page interface might be loaded).
Steps to perform the above may include presenting (730) a survey question via a survey interface and receiving (732) a response to the question, as has been described above (e.g., a quantitative and/or qualitative response, in one or more mediums such as text, numerical, video, audio, etc.). After receiving (732) the response, the system may update (734) the insight dataset based on the response, and then determine whether the response impacts (736) any pre-configured topics of particular interest (e.g., such as described in the context of
The system may also, after receiving (732) the response and in parallel with performing steps towards generating (740) real time questions, determine that no real time questions are currently prepared (742) and select (744) as the next survey question a pre-configured survey question (e.g., a static question that the survey provider had configured to be asked whether or not dynamic real time questions are generated) or an interim question (e.g., a time-consuming question such as described above, including a 30 second audio or video prompt, a short interactive logic game, a riddle, etc.) designed to occupy the respondent while the real time questions are generated (740). Where the system determines that a real time question is prepared (742), the system may instead select (746) the prepared real time question. In either case, the selected (744) question may then be presented (730) to the respondent via the survey interface, and the steps of
While
To perform the above, the system may receive (800) a survey request from a respondent (e.g., clicking on a link that loads in a web site or software application) and may identify (802) the origin of that request (e.g., by querying a database table that relates unique links, parameters, attributes, or other unique information to an origin). The system may identify (804) respondent characteristics based upon the identified (802) origin and/or other information known about the respondent (e.g., such as may be associated with a user account of the respondent, stored in cookies or other tracking technologies associated with the respondent, or otherwise). Identifying (804) respondent characteristics may include determining that the respondent is likely to have a positive or negative sentiment (e.g., users entering the system from a link associated with a troubleshooting or complaint page or process), that the respondent is likely to be an experienced user of the product, or that the respondent has likely never interacted with the product, all based upon their origin.
The system add (806) respondent profile data (e.g., including origin) to the insight dataset, and provide (818) dynamic survey questions in real time based in part on the origin, as described above in the context of
When adding (806) the respondent profile to the insight dataset, the system may organize the dataset to relate and present insights specific to that origin or category of origin, which may allow a company to, for example, categorize and view insights and sentiment analysis only for respondents whose origin was an electronic mail or text message, or may view insights and sentiment analysis excluding such respondents.
Origins that may be particularly advantageous to organize and filter/present corresponding insights for include, for example, separately presenting (808) insights where the origin is an optical or wireless code scan (e.g., a product itself, or product packaging, may include a QR code or other optical code, or an RFID tag or other wireless tag, that may be interacted with by a user device to initiate the survey). Respondents originating from such a code scan are known to have the product in-hand, and so their responses may be more highly valued or weighted as compared to other origins, or may be otherwise treated differently.
As another example, it may be advantageous to separately present (810) insights where the origin is a social media website. Responses originating from a social media website may be integrated with insights with a lower weight or at a lower impact, or, in the case of responses originating from a sub-section of a social media website that is dedicated to the topic of the survey, may be integrated at a higher weight or higher impact.
As another example, it may be advantageous to separately present (812) insights where the origin is an electronic mail or text message, separately present (814) insights where the origin is a first-party website associated with the topic (e.g., a manufacturer's website for a product that is the topic of the survey, or to separately present (816) insights where the origin is an image capture and recognition process executed on a user device (e.g., rather than scanning a code to identify the product and origin, an image of the product itself may be uploaded and analyzed to identify the product and initiate a survey).
As another example of an input type by which a respondent may provide input as part of a mixed medium response (e.g., such as illustrated in
The survey interface (e.g., web location rendered via a browser, or a native software application installed on the user device) is configured to track modifications (824) of the AR object position, and throughout a user's interactions with the AR object the system receives (826) data that populates an AR object position and orientation timeline, which is added (828) to the insight dataset and used as an additional data layer and input source for mixed response analysis. As AR position and orientation timeline data is added and the insight dataset is updated, the system may also provide (840) dynamic survey questions based at least in part on the position and orientation timeline. For example, where a respondent is interacting with an AR object representing a shoe during a survey asking questions related to the shoe, the system may receive and analyze various respondent inputs (e.g., such as shown in
Continuing this example, the respondent may be providing audio and video feedback in response to a prompt while moving or modifying (824) the position and orientation of the AR shoe, and the resulting insight dataset (e.g., based on sentiment analysis of images (420), video (422), audio (424), transcription (424), etc.) may indicate that the respondent's sentiment towards the shoe varied based upon the position and orientation of the AR shoe (e.g., sentiment may be very positive while viewing the shoe from a side profile view, but may become negative when viewing the shoe from behind, or from below). In response, the system may provide a dynamic survey question, as has been described above, based on the position and orientation influenced insight dataset. As an example, such a dynamic survey question might include rendering the AR shoe at the position and orientation that elicited the most positive sentiment, and providing a prompt for the respondent to provide more audio/video responses describing what they particularly liked about that part of the shoe (e.g., “We noticed that you really liked the shoe from this angle, tell us why!” or prompting the respondent to touch the AR object rendered on their user device display touchscreen to indicate their favorite visual feature of the shoe from that angle (e.g., “We noticed that you really liked the shoe from this angle, touch the portion of the shoe that you think looks great!”).
As an alternate example of incorporating (828) the position and orientation timeline into the insight dataset and providing (840) dynamic questioning based thereon, the system may produce additional inputs for the insight dataset based on the combination of the timeline with other respondent inputs (e.g., such as those shown in
As further example, the system may organize and present (832) insights based upon other physical, real-world objects that are detected as being proximate to the AR object. In this example, the system may analyze an image over which the AR object is rendered, and may identify a physical object or attributes of a physical object (e.g., using object recognition techniques) present within that image, which may be used to provide additional insights and/or provide (840) dynamic survey questions. For example, where the AR object is a shoe, the respondent may be prompted to position the AR shoe near the leg opening of a pair of pants that they would likely wear with the shoe. The system may capture that image and use object recognition techniques to determine the color, style, or other characteristic of the pants near which the shoe is placed which provides additional useful insights and opportunities for dynamic (840) questioning. As one example, the system may determine that the pants are blue, but that the respondent's sentiment was negative while the AR shoe was positioned there, and as a result may prompt the user: “Maybe blue wasn't a good choice, try it with a pair of black pants!” As another example, the system may determine that the pants are a blue jean material, and may prompt the user: “Looks good with blue jeans, now try it with something a little more formal!”
As further example, the system may organize and present (836) insights based upon other AR objects that are detected as being proximate to each other, for similar purposes as those described above in relation to proximate physical objects (832). For example, the survey interface may provide several AR objects, and may prompt the user to arrange them relative to each other in some manner (e.g., “Line the AR shoes up from right to left in order of your preference, from most favorite to least favorite” or “Arrange the AR with the AR pants that you think they look best with”).
As further example, the system may organize and present (838) insights based upon the physical setting in which the AR object is placed. For example, the respondent may be prompted to move the AR object into the room of their dwelling in which they would most likely place or use the object (e.g., “Take this AR wall decoration and place it in the room you think it looks best in”), and the system may detect, based upon captured images or image sequences over which the AR object is rendered the type of physical room or setting, or characteristics of the physical room or setting based upon object recognition techniques. For example, this may include determining that the AR object was placed in a kitchen or living room based upon detection of objects commonly found in those spaces (e.g., a television, a refrigerator), or may include determining colors prevalent in the room, the level of lighting present in the room, or other characteristics that may be determined based upon captured images of the physical setting in which the AR object is placed. As with other examples, this may be used to provide additional insights and to provide (840) dynamic questioning (e.g., where the AR object is placed in a room determined to be a kitchen, the system may prompt the respondent to provide additional audio/video feedback: “Looks like you prefer the AR wall decoration in the kitchen, how did you come to that decision?”).
In such implementations, the system may receive (850) an insight selection from an administrative user that is viewing the data (e.g., a selection of positive or negative sentiment for a topic, a selection of all sentiment for a topic). This selection may be made while viewing and interactive with the insight dataset (e.g., such as illustrated in
As an example with reference to
The system may then, based upon the resulting selected topics and insights (e.g., cat, cat plus cat food, etc.) identify raw video and/or audio data from the originally received respondent inputs that are relevant to the selected topics and insights (e.g., either selected randomly from the respondent inputs, or selected based upon maximal or other relevant analysis results), and may identify (856) sub-clips from that video and/or audio data based on the clip limitations (e.g., limit each clip to no more than 10 seconds, limit aggregate duration of sub-clips to no more than 3 minutes).
The system may also identify (858) non audio/video respondent content for inclusion in the clap, which may include free form text responses, structured or strongly typed responses, AR object manipulation and interaction timelines, and other respondent inputs. The system may then create (860) a reel definition based on the identified (856) sub-clips and identified (858) other content. When creating (860) a sub-clip definition, the system may be configured to group the response data into relevant clusters (e.g., 10 video sub-clips showing positive sentiment on a selected insight or topic may be grouped together, followed by 10 video sub-clips showing negative sentiment grouped together, with related text and other identified respondent content displayed as an overlay on video clips or between clip transitions).
First creating (860) the reel as a definition (e.g., a collection of metadata that identifies the included content) instead of as newly generated files (e.g., such as a new video file produced from sub clips) allows the system to quickly create reels for selected topics and insights without greatly consuming processing capabilities or storage capabilities (e.g., the disk size required for a reel definition is insignificant in comparison to the disk size required for a new video file). The system may provide a viewing interface to an administrative user through which the reel may be streamed (862) and presented to the user based on the created (860) reel definition, with the component content being streamed from its original location in real-time based on the definition. The system may also receive (864) download requests from users for particular reels, and may generate (866) a downloadable reel and/or download link based on the reel definition. The downloadable reel may be created as a new video file, slide presentation, universal document format, or other file type, such that the requesting user may download, view, and share the reel as a new standalone file.
It should be understood that any one or more of the teachings, expressions, embodiments, examples, etc. described herein may be combined with any one or more of the other teachings, expressions, embodiments, examples, etc. that are described herein. The following-described teachings, expressions, embodiments, examples, etc. should therefore not be viewed in isolation relative to each other. Various suitable ways in which the teachings herein may be combined will be readily apparent to those of ordinary skill in the art in view of the teachings herein. Such modifications and variations are intended to be included within the scope of the claims.
Having shown and described various embodiments of the present invention, further adaptations of the methods and systems described herein may be accomplished by appropriate modifications by one of ordinary skill in the art without departing from the scope of the present invention. Several of such potential modifications have been mentioned, and others will be apparent to those skilled in the art. For instance, the examples, embodiments, geometrics, materials, dimensions, ratios, steps, and the like discussed above are illustrative and are not required. Accordingly, the scope of the present invention should be considered in terms of the following claims and is understood not to be limited to the details of structure and operation shown and described in the specification and drawings.
Claims
1. A system for conducting a survey to collect and analyze mixed medium responses from a plurality of respondents, the system comprising:
- (a) a server comprising a processor and a memory; and
- (b) a data storage configured to store sets of response data received from the plurality of respondents and an insight dataset comprising the results of one or more analyses of a plurality of topics described in the sets of response data;
- wherein the processor is configured to:
- (i) provide a survey interface to a plurality of user devices associated with the plurality of respondents, wherein the survey interface is configured to receive a response dataset from each respondent that includes data of at least two response mediums;
- (ii) for each response dataset, determine a set of topics described in that response dataset, and add that response dataset to the stored sets of response data;
- (iii) determine an attribute for each of the set of topics based on a multi-modal analysis of that response dataset, and add the attribute for each of the set of topics to the insight dataset; and
- (iv) provide an insight interface dataset based on the insight dataset, wherein the insight interface dataset comprises at least a description of the plurality of topics and, for each of the plurality of topics, a topic attribute associated with that topic.
2. The system of claim 1, wherein the determined attribute for each of the set of topics describes sentiment of a respondent corresponding to that response dataset for that topic.
3. The system of claim 1, wherein the stored set of response data comprise time-indexed raw response data received from the plurality of respondents.
4. The system of claim 1, wherein the insight interface dataset is configured to cause an administrative device to display an insight interface that is configured to:
- (i) for each of the plurality of topics, display that topic and the topic attribute associated with that topic; and
- (ii) provide a set of controls that may be interacted with via the administrative device to, for a selected topic of the plurality of topics, present raw response data from one or more respondents whose response datasets contributed to determining the attribute for the selected topic.
5. The system of claim 4, wherein the processor is configured to, when presenting raw response data via the insight interface:
- (i) select for presentation a set of relevant video and audio from the response dataset based on a time-index association with the selected topic; and
- (ii) omit from presentation any portion of the response dataset other than the set of relevant video and audio.
6. The system of claim 1, wherein the multi-modal analysis includes correlating separate sentiment analysis of the at least two response mediums with each other.
7. The system of claim 6, wherein:
- (i) a first response medium of the at least two response mediums is a text medium associated with a text sentiment analysis, and a second response medium of the at least two response mediums is an audio medium associated with an audio sentiment analysis; and
- (ii) the multi-modal analysis includes correlating the text sentiment analysis with the audio sentiment analysis.
8. The system of claim 1, wherein the survey interface is configured to:
- (i) display a sequence of pre-configured text prompts, wherein each of the sequence of pre-configured text prompts include a question or instruction for providing the response dataset to the survey interface; and
- (ii) for each of the sequence of pre-configured text prompts, display a set of response controls configured to receive response data that includes at least two response mediums.
9. The system of claim 1, wherein the at least two response mediums include a first response medium that describes a qualitative response and a second response medium that describes a quantitative response.
10. The system of claim 1, wherein the at least two response mediums include a video response, wherein the processor is further configured to, when performing the multi-modal analysis of the video response, perform two or more of:
- (i) using images from the video response as a first medium;
- (ii) using audio from the video response as a second medium; and
- (iii) creating a text transcript based on the audio, and using the text transcript as a third medium.
11. The system of claim 1, wherein the survey interface is configured to display a sequence of survey screens, wherein each of the sequence of survey screens includes at least one text prompt and at least one response control, and wherein the processor is further configured to:
- (i) receive a portion of the response dataset via a first survey screen from the sequence of survey screens; and
- (ii) after performing the multi-modal analysis for a topic of the set of topics that is reflected in the portion of the response dataset, create a dynamic real time question based on the sentiment for the topic, wherein the dynamic real time question is associated with at least one text prompt and at least one response control.
12. The system of claim 11, wherein the processor is further configured to, in parallel with creating the dynamic real time question:
- (i) if the dynamic real time question is not yet created, cause the survey interface to display the next survey screen in the sequence of survey screens as a subsequent survey screen; and
- (ii) if the dynamic real time is created, cause the survey interface to display a dynamic survey screen that includes the at least one text prompt and the at least one response control as the subsequent survey screen.
13. The system of claim 1, wherein the data storage is further configured to store configurations for an augmented reality object that is associated with one or more topics of the plurality of topics, and wherein the survey interface is configured to cause a user device of the plurality of user devices to:
- (i) display the augmented reality object on a display of the user device as an overlay upon an image captured by a camera of the user device;
- (ii) provide a set of user controls that may be interacted with to modify a rotational orientation of the overlay of the augmented reality object.
14. The system of claim 13, wherein the processor is further configured to receive, as the at least two response mediums, a video response and an augmented reality object response, wherein:
- (i) the augmented reality object response includes a timeline of the rotational orientation of the overlay; and
- (ii) performing the multi-modal analysis includes determining the attribute for each of the set of topics based on the association between: (A) individual sentiment for a time period from the video response; and (B) the rotational orientation of the augmented reality object during the time period.
15. A method for conducting a survey to collect and analyze mixed medium responses from a plurality of respondents, the method comprising, by a processor:
- (i) providing a survey interface to a plurality of user devices associated with the plurality of respondents, wherein the survey interface is configured to receive a response dataset from each respondent that includes data of at least two response mediums;
- (ii) for each response dataset, determining a set of topics described in that response dataset, and adding that response dataset to sets of response data stored by a data storage;
- (iii) determining an attribute for each of the set of topics based on a multi-modal analysis of that response dataset, and adding the attribute for each of the set of topics to an insight dataset stored by the data storage; and
- (iv) providing an insight interface dataset based on the insight dataset, wherein the insight interface dataset comprises at least a description of the plurality of topics and, for each of the plurality of topics, a topic attribute associated with that topic.
16. The method of claim 15, wherein the insight interface dataset is configured to cause an administrative device to display an insight interface that is configured to:
- (i) for each of the plurality of topics, display that topic and the topic attribute associated with that topic; and
- (ii) provide a set of controls that may be interacted with via the administrative device to, for a selected topic of the plurality of topics, present raw response data from one or more respondents whose response datasets contributed to determining the attribute for the selected topic.
17. The method of claim 16, further comprising when presenting raw response data via the insight interface:
- (i) selecting for presentation a set of relevant video and audio from the response dataset based on a time-index association with the selected topic; and
- (ii) omitting from presentation any portion of the response dataset other than the set of relevant video and audio.
18. The method of claim 15, wherein the survey interface is configured to display a sequence of survey screens, wherein each of the sequence of survey screens includes at least one text prompt and at least one response control, the method further comprising:
- (i) receiving a portion of the response dataset via a first survey screen from the sequence of survey screens; and
- (ii) after performing the multi-modal analysis for a topic of the set of topics that is reflected in the portion of the response dataset, creating a dynamic real time question based on the sentiment for the topic, wherein the dynamic real time question is associated with at least one text prompt and at least one response control.
19. The method of claim 18, further comprising, in parallel with creating the dynamic real time question:
- (i) if the dynamic real time question is not yet created, causing the survey interface to display the next survey screen in the sequence of survey screens as a subsequent survey screen; and
- (ii) if the dynamic real time question is created, causing the survey interface to display a dynamic survey screen that includes the at least one text prompt and the at least one response control as the subsequent survey screen.
20. A system for conducting a survey to collect and analyze mixed medium responses from a plurality of respondents, the system comprising:
- (a) a server comprising a processor and a memory; and
- (b) a data storage configured to store sets of response data received from the plurality of respondents and an insight dataset comprising the results of one or more analyses of a plurality of topics described in the sets of response data, wherein the stored sets of response data comprise time-indexed raw response data received from the plurality of respondents;
- wherein the processor is configured to:
- (i) provide a survey interface to a plurality of user devices associated with the plurality of respondents, wherein the survey interface is configured to receive a response dataset from each respondent that includes data of at least two response mediums;
- (ii) for each response dataset, determine a set of topics described in that response dataset, and add that response dataset to the stored sets of response data;
- (iii) determine an attribute for each of the set of topics based on a multi-modal analysis of that response dataset, and add the attribute for each of the set of topics to the insight dataset, wherein the attribute for each of the set of topics describes sentiment of a respondent corresponding to that response dataset for that topic; and
- (iv) cause an insight interface to display on an administrative device based on the insight dataset, wherein the insight interface is configured to: (A) for each of the plurality of topics, display a description of that topic and a topic attribute associated with that topic; and (B) provide a set of controls that may be interacted with via the administrative device to, for a selected topic of the plurality of topics, present raw response data from one or more respondents whose response datasets contributed to determining the topic attribute for the selected topic.
Type: Application
Filed: Nov 28, 2022
Publication Date: Mar 23, 2023
Inventors: Chad A. Reynolds (Cincinnati, OH), Benjamin L. Vaughan (Cincinnati, OH), Jennifer L. Tackett (Cincinnati, OH)
Application Number: 18/070,248