Framework for Augmented Machine Decision Making

Info

Publication number: 20170098162
Type: Application
Filed: Oct 6, 2016
Publication Date: Apr 6, 2017
Inventors: Michael Ellenbogen (Wayland, MA), M. Brendan McCord (Boston, MA), Brian Knoth (East Walpole, MA), Brandon Wolfe (Tyngsboro, MA)
Application Number: 15/287,599

Abstract

Sensor data is received. The sensor data is classified into one of two or more classes by at least requesting processing of a machine computational component, receiving a result of the machine computation component, requesting processing of an agent computation component, and receiving a result of the agent computation component. The agent computation component includes a platform to query an agent. The result from the agent computation component or the result from the machine computation component is provided. Related apparatus, systems, techniques, and articles are also described.

Description

Description

RELATED APPLICATION

This application claims priority under 35 U.S.C. §119(e) to U.S. provisional application No. 62/237,733 filed Oct. 6, 2015, the entire contents of which are hereby expressly incorporated by reference herein.

TECHNICAL FIELD

The subject matter described herein relates to improving machine decision making.

BACKGROUND

In artificial intelligence (AI), difficult problems are informally known as AI-complete or AI-hard, implying that the difficulty of these computational problems is equivalent to that of solving the central artificial intelligence problem, which is making computers as intelligent as people, also referred to as strong AI. An AI-complete problem is one not solved by a simple specific algorithm. AI-complete problems include computer vision, natural language understanding, dealing with unexpected circumstances while solving any real world problem, and the like. Currently, AI-complete problems cannot be solved with modern computer technology alone.

Current AI systems can solve very simple restricted versions of AI-complete problems, but never in their full generality. When AI researchers attempt to “scale up” their systems to handle more complicated, real world situations, the programs tend to become excessively brittle without commonsense knowledge or a rudimentary understanding of the situation. In other words, they fail as unexpected circumstances outside of its original problem context begin to appear. When human beings are dealing with new situations in the world, they know what to expect: they know what all things around them are, why they are there, what they are likely to do and so on. Humans can use context and experience to guide them in recognizing unusual situations and adjusting accordingly. A machine without strong AI has no other skills to fall back on so some machine decision-making applications are intractable.

SUMMARY

In an aspect, sensor data is received. The sensor data is classified into one of two or more classes by at least requesting processing of a machine computational component, receiving a result of the machine computation component, requesting processing of an agent computation component, and receiving a result of the agent computation component. The agent computation component includes a platform to query an agent. The result from the agent computation component or the result from the machine computation component is provided.

In another aspect, sensor data of a security system asset is received. A predefined modality associated with the security system asset is accessed. The modality defining a computational task for analyzing the received sensor data. A solution state machine object having a plurality of states and rules for transitioning between the plurality of state is instantiated. The plurality of states includes an initial state, a first intermediate state, a second intermediate state, and a terminal state. The task is executed using the solution state machine object. The executing includes requesting processing of the task by, and receiving a result of, a machine computation component when a current state of the solution state machine object is the first intermediate state. The result received from the machine computation component includes a first confidence measure. The executing includes requesting processing of the task by, and receiving a result of, an agent computation component when the current state of the solution state machine object is the second intermediate state. The result received from the agent computation component including a second confidence measure. The executing includes transitioning the current state of the solution state machine object according to the transition rules and at least one of: the first confidence measure and the second confidence measure. A characterization of the terminal state is provided when the current state of the solution state machine object is the terminal state.

One or more of the following features can be included in any feasible combination. For example, processing of the agent computation component can be requested when a confidence of the machine computation component result is below a first threshold. Processing of the agent computational component can be requested when the confidence of the machine computation component result is above a second threshold. Providing can include requesting further processing of the agent computation component result by the machine computation component. The providing can include requesting further processing of the machine computation component result by the agent computation component.

The machine computation component can include a deep learning artificial intelligence classifier. The machine computation component can detect objects and classify objects in the sensor data. The sensor data can include an image.

A composite result from the machine computation component result and the agent computation component result can be determined. The determining can include using a measure of result confidence. At least one of the receiving, classifying, and providing can be performed by at least one data processor forming part of at least one computing system.

The machine computation component can execute a machine learning algorithm to perform the task. The machine computation component includes a convolutional neural network.

The agent computation component can include a platform that queries at least one agent, receives a query result, determines a confidence measure of the agent, and determines the second confidence measure using the confidence measure of the queried agent.

The sensor data can include an image including a single image, a series of images, or a video. The computational task can include: detecting a pattern in the image; detecting a presence of an object within the image; detecting a presence of a person within the image; detecting intrusion of the object or person within a region of the image; detecting suspicious behavior of the person within the image; detecting an activity of the person within the image; detecting an object carried by the person, detecting a trajectory of the object or the person in the image; a status of the object or person in the image; identifying whether a person who is detected is on a watch list; determining whether a person or object has loitered for a certain amount of time; detecting interaction among person or objects; tracking a person or object; determining status of a scene or environment; determining the sentiment of one or more people; counting the number of objects or people; determining whether a person appears to be lost; determining whether an event is normal or abnormal; and/or determining whether text matches that in a database.

The security system asset can include an imaging device, a video camera, a still camera, a radar imaging device, a microphone, a chemical sensor, an acoustic sensor, a radiation sensor, a thermal sensor, a pressure sensor, a force sensor, or a proximity sensor. The modality can define solution state machine object attributes, acceptable confidence for reaching the terminal state, a set of assets that trigger the modality, and/or agent query structure.

Executing the task can include posting, via a messaging queuing protocol, requested processing tasks. The machine computation component and agent computation component can include microservices operating on tasks posted via the messaging queue protocol.

A predictive model of the machine computation component can be modified using the result received from the agent computation component as a supervisory signal and the received sensor data as input.

At least one of the receiving, accessing, instantiating, executing, and providing is performed by at least one data processor forming part of at least one computing system.

In yet another aspect, sensor data is received. The sensor data is classified into a first class by at least requesting processing of a machine computational component, receiving a first result of the machine computation component, requesting processing of an agent computation component, and receiving a first result of the agent computation component. The agent computation component includes a platform to query an agent. The sensor data can be classified into a second class by at least requesting processing of the machine computational component, receiving a second result of the machine computation component, requesting processing of the agent computation component, and receiving a second result of the agent computation component. A set of rules is applied to the first class and the second class to enable a determination of a composite classification. The composite result is provided.

In yet another aspect, first sensor data of a first security system asset and second sensor data of a second security system asset are received. A first predefined modality associated with the first security system asset and a second predefined modality associated with the second security system asset is accessed. The first modality defines a first computational task for analyzing the received first sensor data. The second modality defines a second computational task for analyzing the received second sensor data. A first solution state machine object and a second solution state machine object are instantiated. The first solution state machine object has a plurality of states and rules for transitioning between the plurality of state. The plurality of states includes an initial state, a first intermediate state, a second intermediate state, and a terminal state. A result of the first task and a result of the second task are determined by executing the first task using the first solution state machine object and the second task using the second solution state machine object. The executing includes requesting processing of the first task by a machine computation component and an agent computation component. A composite result is determined by applying a set of rules to the result of the first task and the result of the second task. The composite result is provided.

One or more of the following features can be included in any feasible combination. The set of rules can include matching sensor data within a predetermined time-window. The providing can include requesting further processing of the machine computation component result by the agent computation component. The machine computation component can detect objects and classifies objects in the sensor data. At least one of the receiving, classifying, and providing can be performed by at least one data processor forming part of at least one computing system. The sensor data can include a first image of a first security system asset and a second image of a second security system asset. At least one of the receiving, accessing, instantiating, executing, and providing can be performed by at least one data processor forming part of at least one computing system.

Non-transitory computer program products (i.e., physically embodied computer program products) are also described that store instructions, which when executed by one or more data processors of one or more computing systems, causes at least one data processor to perform operations herein. Similarly, computer systems are also described that may include one or more data processors and memory coupled to the one or more data processors. The memory may temporarily or permanently store instructions that cause at least one processor to perform one or more of the operations described herein. In addition, methods can be implemented by one or more data processors either within a single computing system or distributed among two or more computing systems. Such computing systems can be connected and can exchange data and/or commands or other instructions or the like via one or more connections, including but not limited to a connection over a network (e.g. the Internet, a wireless wide area network, a local area network, a wide area network, a wired network, or the like), via a direct connection between one or more of the multiple computing systems, etc.

The details of one or more variations of the subject matter described herein are set forth in the accompanying drawings and the description below. Other features and advantages of the subject matter described herein will be apparent from the description and drawings, and from the claims.

DESCRIPTION OF DRAWINGS

FIG. 1 is a process flow diagram illustrating a method of augmenting artificial intelligence with human intelligence tasks;

FIG. 2 is a state diagram of an example solution state machine as defined by a modality;

FIG. 3 is a diagram illustrating composite modalities to solve a higher-order problem;

FIG. 4 is a process flow diagram illustrating a method of augmenting artificial intelligence using composite modalities;

FIG. 5 is a system block diagram of an example analysis platform including software components for combining machine and human intelligence as a solution for responding to questions and problem scenarios;

FIG. 6 illustrates an exchange of an event messaging system;

FIG. 7 illustrates data flow between components of a platform during a process of augmenting artificial intelligence with human computation;

FIG. 8 is a block diagram illustrating example metadata;

FIGS. 9-11 are tables illustrating example modalities and example security scenarios to which the modality can apply;

FIG. 12 is a system block diagram of an example machine computation component system that implements a deep learning based object detector;

FIG. 13 illustrates an example input image and an example output image to an artificial intelligence system;

FIG. 14 is a system block diagram illustrating an object detector web application program interface (API);

FIG. 15 is a system block diagram illustrating an example system including a human-computation element and a machine decision-making algorithm;

FIG. 16A is a process for injecting human-computation into a machine decision-making algorithm;

FIG. 16B illustrates an example image;

FIG. 17 is a system block diagram illustrating an example implementation of the current subject matter for a video/face recognition system;

FIGS. 18 and 19 are process flow diagrams illustrating using the current subject matter for face recognition and using the face recognition system;

FIGS. 20 and 21 illustrate applying the current subject matter to handle a wide variety of tasks, such as counting sports utility vehicles (SUVs) in a parking lot or validating computer vision analytic performance; and

FIG. 22 is a block diagram illustrating an example of hardware used by the current subject matter.

Like reference symbols in the various drawings indicate like elements.

DETAILED DESCRIPTION

The current subject matter relates to utilizing a “human-in-the-loop” (symbiotic human-machine) approach to facilitated decision making. Humans can contribute entirely new decisions/answers or assist when AI is not highly confident, and in that way are augmenting/assisting the machine process in solving a particular task, not merely verifying the computer decision making. The current subject matter can expand the range of use cases to which a machine decision making system or a given sensor and/or analytic may effectively apply. The current subject matter can provide for injection of a human-computation element into a machine decision-making algorithm, allowing for a human to perform (or solve) specific and narrow decisions that the machine decision making system would otherwise be unable to perform (or would perform poorly). The subject matter can be used with applications that do not currently include machine decision-making algorithms or use algorithms that do not adequately meet user needs, for example a closed circuit television system that currently does not have a machine decision-making algorithm or has limited machine decision-making capability. The current subject matter can enable new capabilities and improve machine decision making, for example, by reducing false alarms, increasing hits, reducing misses, and increasing correct rejections.

While great advances have been made in the area of artificial intelligence, the performance of software-only systems often falls short of that which is needed for applications involving analysis of physical world imagery, video, language processing, and the like. Key challenges for end users are the prevalence of false positives (“false alarms”), the variation in system performance caused by changes in circumstances or scene type (“brittleness”), and the inability for these systems to produce human-like outputs in scenarios that are highly subjective or contextual (as is frequently the case in the physical security domain). The current subject matter includes data analysis and handling that uses human-in-the-loop processing (also referred to as human intelligence tasks) alongside artificial intelligence to address the aforementioned challenges, by combining the respective strengths of computer and human processing, while minimizing the amount of human involvement required.

The current subject matter can include an analysis platform for augmenting machine processing with human intelligence tasks to improve performance and reduce false alarms. The analysis platform can include a machine computation component, described more fully below, that can include predictive models built using a machine learning algorithm, for example, a deep neural network. The machine computation component can classify input data into two or more classes.

The analysis platform can include an agent computation component, described more fully below, that can include a system for querying a pool of agents (e.g., humans) to perform a task, such as detecting a presence of a person or object in an image, answering a question regarding a characteristic of the image, and the like. In some implementations, the agent computation component can provide a query result in substantially real-time, such as within 5, 10, or 30 seconds of receiving a query request. In some implementations, the analysis platform can be applied to a physical security and surveillance domain, which is highly subjective and contextual.

The current subject matter can include use of modalities, which enables any given problem to be broken or segmented into computational tasks. Some tasks may be better performed by an existing artificial intelligence predictive model while other tasks may be better performed by a human. Thus, the current subject matter can route a given task for processing by either an artificial intelligence processing component or an agent (e.g., human) processing component. The concept of modalities can be extended to composite modalities, whereby multiple modalities are combined (e.g., strung together) to solve more difficult and even subjective tasks. Composite modalities can be accurate because the confidence of the result of each underlying modality can be high (e.g., treated as truth).

Because some artificial intelligence systems can be continually trained, their performance can improve over time. The current subject matter can route tasks based on machine performance, which can be represented by a confidence metric produced by the artificial intelligence system. As the artificial intelligence component is trained on more real-world data, the artificial intelligence component will become more accurate and less agent input is required. Thus, the relative processing burdens between the artificial intelligence component and the human intelligence component is dynamic and can vary over time.

FIG. 1 is a process flow diagram illustrating a method 100 of augmenting artificial intelligence with human intelligence tasks. The method 100 of augmenting artificial intelligence with human intelligence tasks is implemented using flow control, which can be represented as a state machine for solving a computational task.

At 110, sensor data is received. The sensor data can be received from and/or of a security system asset. An asset can include an imaging device, a video camera, a still camera, a radar imaging device, a microphone, a chemical sensor, an acoustic sensor, a radiation sensor, a thermal sensor, a pressure sensor, a force sensor, a proximity sensor or a number of other sensor types. “Sensor,” as used herein may include information that did not originate specifically from physical hardware, such as a computer algorithm. The sensor data can include, for example, an image (e.g., optical, radar, and the like), video, audio recording, data generated by any of the above-enumerated assets, and the like. In some implementations, the sensor data can be from a system other than a security system, for example, the sensor data can be access control system data, weather system data, data about the risk posed by an individual or the risk of a security threat given a set of conditions. Other system types are possible.

The security system can include a number of deployment types including closed circuit television, surveillance camera, retail camera, mobile device, body cameras, drone footage, personnel inspection systems, object inspection systems, and the like.

The security system can be implemented in many ways. For example, the security system can include a system to detect for physical intrusion into a space (e.g., whether a person is trespassing in a restricted area); a system to determine whether an individual should or should not be allowed access (e.g., a security gate); a system to detect for objects, people, or vehicles loitering in a region; a system to detect for certain behavior exhibited by a person (e.g., suspicious behavior); a system to detect track a person or object viewed from one asset (e.g., camera) to another asset; a system to determine the status of an object in the asset field of view (e.g., whether there is snow on a walkway); a system to count people or objects (e.g., vehicles) in a scene; a system to detect for abnormal conditions (e.g., as compared to a baseline condition); a system to detect license plates over time; a system to detect for weapons, contraband, or dangerous materials on a person or within a container (e.g., a security checkpoint); and the like.

At 120, a predefined modality is accessed. The accessing can be from memory. The predefined modality can be associated with the security system asset. The modality can define a computational task for analyzing the received sensor data. For example, where the asset is a video monitoring the threshold of a building, the predefined modality can include a computational task that specifies that an image taken by the asset should be processed to detect for a presence of a person in the threshold (e.g., a region of the image). Associated with the asset can be a collection of configurable data that can be provided for each asset modality pairing. Asset details can include, for example, inclusion areas, exclusion areas, filtering parameters, region of interest requirements and the like. These are all specific to the asset scene for that modality.

A modality can be considered an architectural concept that, when used as building blocks, can capture a pattern of security objectives. An analysis platform can expose modalities as building blocks for clearly articulating the problem to be solved. An example modality can include an intrusion detection scenario, where the pattern represented is one of first detecting that a trigger has happened and that the trigger was caused by a human and that the human is intruding upon a defined area. A modality can guide and coordinate machine computation components and agent computation components of the platform. Modalities can provide direction to the analysis platform regarding what the security system is trying to detect or control.

In some implementations, the predefined modality can define a solution state machine or flow control that provides a framework for utilizing the processing components of the analytical platform to solve a problem (which could be a piece of a larger scenario). Each computation component can have access to the solution state machine and can advance the state. The solution state machine can have states including an initial state, intermediate states, and terminal states. Each state can correspond to a particular type of processing by components of the platform. For example, to do trigger detection, a flow control first tries to use a machine computation component to determine if a person is in the frame, and once a person is detected, sends the frame to an agent to determine if the person has crossed a determined threshold. Further, flow control can be used to ensure or improve a certain level of confidence in a determination thereby reducing false alarms. For example, if the machine computation component detects the presence of a person in the frame but returns a low confidence (e.g., characteristics of the image make it challenging for the predictive model to accurately perform) the analysis platform can utilize the agent computation component to process the task (e.g., detect whether a person is present in the frame). The agent computation component can utilize human judgement to perform the task, which may be better suited than the machine computation component.

Sensor data can be initially processed to detect an event. Initial processing can include video motion detection that, when motion is detected, triggers an event. In some implementations, the initial processing can include video analytics that, when an object of interest is detected or rule is satisfied, triggers an event. Occurrence of an event can start a new state machine invocation (embodied in a task) and maintains a state throughout its tasking until it reaches a terminal node. Every participant involved in the process of solving the problem can access and potentially advance the state machine.

The solution state machine can include transition rules for transitioning between states. These rules can be based on the confidence of the associated processing that takes place when the solution state machine is in the state. The solution state machine can also be represented as a directed graph where intermediate nodes corresponding to processing components and edges define transition rules.

The predefined modality can also define or include: the ultimate question to be answered, which can be customized for the particular modality; an agent work form, which can be the type of form agents would be served to best answer the question based on the artifacts received from the assets; an acceptable confidence, which is the acceptable thresholds for considering the question met or not by an artifact; and a set of assets which can trigger tasking for this modality (in other words, the set of sensors/cameras that provide the artifacts specific to this tasking where each asset in the set is independent, meaning, that triggering from each one will cause independent tasking and question resolution).

Concretely, FIG. 2 is a state diagram of an example solution state machine 200 as defined by a modality. “S” is a start state, “MI” is a first machine intelligence state, “MI2” is a second machine intelligence state, “HI” is a human intelligence state, “ES” is a terminal state corresponding to a successful match (e.g., pattern match, classification, detection, and the like), and “EF” is a terminal state corresponding to an unsuccessful match (e.g., pattern match, classification, detection, and the like). “C” relates to confidence of the processing at each state and “T” relates to the number of times the associated processing has been performed. Transition rules are Boolean operators of the confidence (“C”) and processing times (“T”).

Referring again to FIG. 1, at 130, a flow control, or solution state machine object can be instantiated. The instantiating creates a concrete occurrence of the solution state machine object that exists during runtime. The solution state machine object can have at least two intermediate states, one associated with machine computation component processing and one associated with agent computation component processing.

At 140, the computational task is executed using the solution state machine object. A solution state machine object can be represented in persistent data as a transition table and can be accessible for querying and changing state. Executing the computational task using the solution state machine object provides a data driven means of orchestrating the participants in the analysis platform to drive the participants (e.g., the computation components) closer to a confident solution or quickly to a non-solution. A data driven flow can eliminate the need of actual coding a solution and can allow distributed components to cooperate on driving the state machine for any external request.

Execution of the computational task can include, at 142, requesting processing of the task by, and receiving a result of, a machine computation component when the current state of the solution state machine object is in a machine computation component state. The machine computation component can execute the task by applying a predictive model to the sensor data to determine an output (e.g., pattern match, classification, detection, and the like). The machine computation component can also determine a confidence measure of its output. The confidence measure can characterize a likelihood that the output of the machine computation component is correct. For example, in the implementation where the machine computation component is a convolutional neural net, the convolutional neural network's last layer can be a logistic regression layer, which classifies image patches into labels. During the training phase this value can be set to 1 for positive examples and to 0 for negative examples. During the operational phase (e.g., when applying new data to the convolutional neural network) this value can be the probability of an input image being the object of interest.

Execution of the computational task can include, at 144, requesting processing of the task by, and receiving a result of, an agent computation component when the current state of the solution state machine object is in an agent computation component state. The agent computation component can execute the task by querying one or more agents in a pool of agents to perform the task, such as an image recognition task, answering a question regarding a characteristic of the image, and the like. In some implementations, the agent computation component can provide a query result in substantially real-time, such as within 5, 10, or 30 seconds of receiving a query request. The agent computation component can also determine a confidence measure of its output. The confidence measure may be directly supplied by an agent or can be determined by the agent computation component using an algorithm that assesses the accuracy and reliability of the agent that provides a response. The agent computation component can query multiple agents and create a composite output and a composite confidence. The confidence measure can characterize a likelihood that the output of the agent computation component is correct.

Execution of the computational task can include, at 146, transitioning the current state of the solution state machine object according to the transition rules. For a given state, the transition rules can be applied when a result of a computation component is returned by a respective computation component. By applying the transition rules, the current state of the solution state machine can change (according to the transition rules) and, when a new state is entered, an associated processing step can be performed.

Execution of the computation task can include one or more of requesting processing of the task by, and receiving a result of, a machine computation component 142; one or more of requesting processing of the task by, and receiving a result of, an agent computation component 144; and one or more of transitioning the current state of the solution state machine object according to the transition rules 146. Execution of the computation task can be performed according to the solution state machine object states and transition rules as specified in the predefined modality. Thus, execution of the computation task is a flexible process that can vary, for example, between tasks and specific content of the sensor data.

Once the current state of the solution state machine object is a terminal state, at 150, a characterization of the terminal state can be provided. The characterization may relate to a classification of the sensor data (according to the task). For example, if the task being performed is to detect whether or not there is a person in an image, a given solution state machine object may have two terminal states, a first terminal state (e.g., a solution state) that is reached if the agent computation component and machine computation component can provide a classification with a certain level of confidence (e.g., 0.9), and a second terminal state (e.g., a non-solution state) when the agent computation component and machine computation component cannot provide a classification with a certain level of confidence (e.g., less than 0.9).

The characterization of the terminal state can be provided, for example, as an alert to a manager of the security system. For example, the security system manager may have an escalation policy that requires they be alerted regarding the outcome of the task if the task detects a certain condition (e.g., intrusion into the building is occurring). The escalation alert can be in any form, such as MMS, SMS text, email, and the like.

Modalities can be considered as processing building blocks that answer relatively basic tasks. For example, FIGS. 9-11 are tables illustrating example modalities and example security scenarios to which the modality could apply. Modalities are flexible and a powerful tool for problem solving within the context of a human augmented machine decision making system. Modalities may be combined (or strung together) for answering complex and subjective problems. Modality composition is the ability to express a hierarchy of modalities such that positive results from lower tasking are passed up to a composite modality which aggregates multiple modality results to answer a higher-order question. The power of composite modalities can include the fact that truth (or high-confidence determinations) is established at terminal modalities and that truth is passed up to make very informed aggregate decisions.

For example, consider FIG. 3, which is a diagram illustrating composite modalities to solve a higher-order problem. A security system has 2 cameras with completely different fields of view; one (Camera-1) is inside the facility looking at a door and another (Camera-2) is outside the facility looking at a loading dock. The operator of the system should be alerted whenever someone enters the door and there is no truck in the loading dock. This problem (e.g., scenario) can be solved by composite modalities. Camera-1 can run an intrusion modality, while Camera-2 can run a presence modality. Each of these cameras can produce sensor data (e.g., artifacts) and provide the sensor data to the analysis platform. The analysis platform can initiate modality tasking for each of the two sensors independently. The security system operator can be alerted if there is an aggregate positive condition of both within the same time frame. Events across all sub-modalities can be recorded and correlation can be performed whenever a sub-modality triggers a match.

Modality composition can be defined by specific rules that match sub-modality results with each other to try and satisfy the composite modality. Composite rules can have specific logic for composing their sub-modalities. The logic can be augmented with customer input for rules (e.g., values) that should be used for a specific security system.

For example, the following composite modality rules can be defined: time synchronization, match list, time series, and value coordinated. Time synchronization rule attempts to match all sub-modality results that occur within the same time frame. The threshold of the time frame can be customer defined. All times used are the times stamped by the asset (e.g., camera) at the precise time the artifact is collected. Match list rule attempts to match all last recorded sub-modality results when any one of the sub-modalities has a new reporting. So, if sub-modality A has a new report, it attempts to match on whatever the last recorded value for sub-modality B is at that time. Time series rule attempts to match all sub-modality results which occur within a sequenced time period from one to the next. The sequence order and time thresholds can be customer defined. Value coordinated attempts to match all sub-modality results that have specific values for answers given by the analytic platform. The values and matching criteria can be provided by the customer. Other composite modality rules are possible.

FIG. 4 is a process flow diagram illustrating a method 400 of augmenting artificial intelligence using composite modalities. At 410, sensor data is received from a first security system asset and sensor data is received of a second security system asset. For example, the assets can include a first camera and a second camera. Each camera need not have overlapping field of views.

At 420, a first predefined modality associated with the first security system asset and a second predefined modality associated with the second security system asset can be accessed. The first modality can define a computational task for analyzing the received first sensor data. The second modality can define a second computational task for analyzing the received second sensor data. For example the first modality can be an intrusion modality and the second modality can be a presence modality.

At 430, a first solution state machine object and a second solution state machine object is instantiated. For example, each instating creates a concrete occurrence of the respective solution state machine object that exists during runtime. Each respective solution state machine object can have at least two intermediate states, one associated with machine computation component processing and one associated with agent computation component processing.

At 440, each task can be executed using their respective solution state machine objects such that the processing includes processing by a machine computation component and by an agent computation component. After execution, each task has a result (for example, presence of a person or intrusion is detected).

At 450, a composite result can be determined by applying a set of rules to the results of the tasks. For example, the set of composite rules can include a rule requiring each modality result to be positive and that the sensor data that led to the positive results were obtained within one minute of one another.

At 460, the composite result can be provided. The composite result can be provided, for example, as part of an escalation policy to alert the security system operator. The composite result may relate to a classification of the sensor data (according to the task). The composite result can be provided, for example, as an alert to a manager of the security system. For example, the security system manager may have an escalation policy that requires they be alerted regarding the outcome of the task if the task detects a certain condition (e.g., intrusion into the building is occurring). The escalation alert can be in any form, such as MMS, SMS text, email, and the like.

In some implementations, the computational task includes: detecting a pattern in the image; detecting a presence of an object within the image; detecting a presence of a person within the image; detecting intrusion of the object or person within a region of the image; detecting suspicious behavior of the person within the image; detecting an activity of the person within the image; detecting an object carried by the person, detecting a trajectory of the object or the person in the image; a status of the object or person in the image; identifying whether a person who is detected is on a watch list (e.g., part of a gallery of face images); determining whether a person or object has loitered for a certain amount of time; detecting interaction among person or objects; tracking a person or object; determining status of a scene or environment (e.g., cleanliness, feeling of safety, weather conditions); determining the sentiment of one or more people; counting the number of objects or people; determining whether a person appears to be lost (e.g., non-suspicious behavior); determining whether an event is normal or abnormal; and determining whether text (e.g., license plate text) matches that in a database. Other tasks are possible as the current subject matter can apply to a wide range of tasks.

As described above, the machine computation component can include an artificial intelligence (e.g., machine learning) system that develops and utilizes a predictive model. The machine computation component can include any number of algorithms. In some implementations, the machine computation component can include an artificial intelligence algorithm, a machine learning algorithm, a deep learning algorithm, a deep neural network, a convolutional neural network (CNN), a Faster Region-based CNN (R-CNN), and the like. For example, FIG. 12 is a system block diagram of an example machine computation component system 1200 that implements a deep learning based object detector 1210. The object detector 1210 includes a CNN for performing image processing including creating a bounding box around objects in an image and detecting or classifying the objects in the image. The input to the object detector is a digital image and the output is an array of bounding boxes and corresponding class labels. An example input image and an example output is illustrated in FIG. 13. The class labels are: person, car, helmet, and motor cycle.

In some implementations, Faster R-CNN incorporates flow information. This approach can reduce false alarms from the AI. A real time tracking method can be used. The real time tracking method uses data association and state estimation techniques to correct the bounding boxes and remove false positives. The tracking method assumes a linear velocity model and computes the location of the object in next frame using a Kalman Filter method.

Before an object detector can be used for detecting objects, it needs to be trained. A training set can include one or more images with bounding boxes around objects the system is interested in detecting and the corresponding class labels. A database of training images can be created or maintained. In some implementations, the database can be updated over time with real world images and labels.

Hard negative mining can better train the convolutional neural network. The example Faster R-CNN uses background patches in the image as negative examples. In some implementations, since the number of background patches is generally much larger than the number of object patches, all background patches cannot be included because doing so biases the object detection model. A specific ratio (20:1) for negative and positive examples can be maintained. Faster R-CNN can pick these negative examples randomly. For hard negative mining those negative examples that result into highest loss can be chosen. But this approach trains the predictive model only for difficult and unusual examples of objects. So half the negative examples can be taken from hard negative (which give highest loss) and half of them taken randomly from rest of the negative examples.

In example implementations, a Faster R-CNN based object detector 1210 is used. The Faster R-CNN 1210 includes a bank of convolution layers 1220, a region proposal network (RPN) 1230, and an object classifier 1240. The bank of convolution layers 1220 finds features that are useful for two purposes: a) finding which rectangular regions in the image potentially contain an object of interest and b) correctly classifying the object inside the proposed rectangular regions. The RPN 1230 looks at the feature maps produced by the convolutional layers 1220 and proposes rectangular regions that may contain an object of interest. The object classifier 1240 looks at the feature maps and each region proposed by the RPN 1230 and classifies each region as one of the objects of interest or not. The object classifier can generate a score from 0.0 to 1.0 related to the confidence that the object is not present (0.0) or present (1.0). The classification can be binary or multiclass.

Training the object detector requires finding the right weights/parameters associated with each of these three components. Manually labeled bounding boxes and object labels are used to guide the process of finding the correct weights using a backpropagation algorithm. Using an alternate or additional training method, the RPN 1230 is first trained and the region proposals are used to train the object classifier 1240. The network tuned by object classifier can then be used to initialize RPN 1230, and this process is iterated. This way the convolutional layer 1220 is tuned to be effective for both the RPN 1230 and the object classifier 1240.

In the execution phase, a trained object detector 1250 is used to detect objects (e.g., bounding boxes, class labels, and confidence levels) in an image not in the training set. In addition to the class label, the trained object detector 1250 also returns the confidence measure for every bounding box.

FIG. 14 is a system block diagram illustrating an object detector web API. A web server accepts requests from multiple clients and returns the response for the respective request. An application server runs applications in threads, maintaining the correspondence between threads and requests passed from the webserver. An application on the application server runs the object detection algorithms and returns detections in the form of objects to the application server, which passes the detections to the webserver, which passes the response to the client machine.

In some implementations, a high-confidence output from the agent computation component can be used to train one or more artificial intelligence systems forming the machine computation component. When a high-confidence output is received from the agent computation component, the analysis platform can train an artificial intelligence system using the high-confidence agent computation component output as the supervisory signal and the sensor data as the input signal. Thus, the analysis platform can continually improve in performance and require fewer agent computation component queries to perform the same amount of work. When the confidence measure returned by the machine computation component is low, the image can be sent to an agent who can correct any mistakes in bounding boxes or labeling. Images that have incorrect bounding boxes and/or misclassified labels can be fixed and added to the training set. The system is continuously getting better as it is routinely retrained after the addition of these harder examples to the training set.

FIG. 5 is a system block diagram of an example analysis platform 500 that is a system of software components for combining machine and human intelligence as a solution for responding to questions and problem scenarios, for example, relating to security. A customer can provide a problem specification, desired questions or tasks to be performed, and raw inputs (e.g., sensor data such as video). The platform 500 can configure to provide answers or matches (e.g., results) to the customer.

The example platform 500 is a reactive system of cooperating software components and the communication flows between them. A reactive system is one that is responsive, resilient, elastic and message driven. As such, new functionality can be added to platform 500 easily to extend the overall system capabilities. Each software component can include a microservice, which can be a fully encapsulated and deployable software component capable of communicating with other platform software components by event-based messaging or directly.

Platform 500 system can be a distributed, open platform which can service many projects simultaneously (multi-tenant). Being an open platform means that there is well defined and formalized communication and messaging specifications, by which, loosely-coupled participants can easily join to enhance the overall system capabilities and functionality. The platform 500 provides many core services that participating components will be able to utilize for common purposes, such as: common data formats; transaction logging/auditing; project specifications; monitoring and health management; message routing, human and machine intelligence integration; and third party integrations.

In some implementations, platform 500 follows a microservices architectural approach for rapidly building independent, functionally bounded components that collaborate to provide an end-to-end solution. Collaboration among components can be designed along both event-driven and service-oriented architectures. Workflow orchestrations can be both ad-hoc by providing a core publish/subscribe system and formal via standard web service representational state transfer (REST) API endpoint.

Platform 500 includes an event messaging system 505 and a number of distributed microservices (510, 515, 520, 525, 530, 535, 540, 545, 550, and 555). The distributed microservices are components or modules of the platform 500 and communicate via the event messaging system 505. With regard to the event messaging system 505, a principal communication mechanism for microservices is event-based messaging. Advanced Message Queuing Protocol (AMQP) is an example protocol having distributed queue management and publish/subscribe semantics.

A component of AMQP is the exchange 600, illustrated in FIG. 6. An exchange accepts messages and routes them to queues according to the queue binding type and/or subscription matches. Topic-based exchanges allow for consumer queue subscriptions with a routing key pattern, including both wildcards and explicit matching requirements. Messages that match a routing key are delivered to the consumer's queue. Another component of AMQP is the queue. A message queue may be either specific to a consumer or shared amongst consumers (worker queue). A consumer must acknowledge messages as processed from a queue. Messages that are not acknowledged, by possibly a consumer exiting or crashing, will be re-queued for future delivery.

Whispering can be the ability for any component to directly converse with any other component. The target component must support whispering and must be listening to the global whisper exchange. The routing key specified by the caller designates which platform 500 component will get the whisper message. If the whisper message is bi-directional, then the caller must also provide a “reply-to” queue which will receive the response.

Referring again to FIG. 5, microservices include the smart media processor (SMP) 510, health and quality services 515, task director services 520, machine analytic services 525, data management services 530, media management services 535, record keeping services 540, alert messaging services 545, audit and record tracking 550, and agent management services 555. Because the event messaging system 505 is flexible and expandable, additional or fewer microservices are possible. The platform 500 includes an interface to a customer 560, which can include a one or more security systems, each having one or more assets providing sensor data to the SMP 510.

Smart media processor 510 can include a software component that can processes one or more video stream sources and route workable multimedia to the platform 500. It can be easily configured and modified via the platform 500 communication to alter its operating behavior. It can also be tasked to obtain additional multimedia on demand (e.g.: x minute clip before/after some time for some asset).

Health and quality services 515 monitors all platform 500 participants for health and quality. Data management services 530 maintains customer account/project level and dynamic state data that all platform 500 participants may need access to or contribute to.

Media management service 535 manages all multimedia resource data obtained from customer assets and persists them in long-term storage. Alert messaging services 545 is responsible for determining the correct escalation procedures and executing them (notification, data collections, and the like) when a task result has been achieved. This can involve personal alarming, machine-to-machine integration or both. Alert messaging services can alert customers via defined mechanism (SMS, MMS, text, email, and the like) when triggered to do so. Record keeping services 540 and audit and record tracking 550 can record all raw data of platform 500 activity to a data warehouse and data lake for offline analysis and presentation.

Machine analytic services 525 integrate artificial intelligence and deep machine learning into platform 500. The machine analytic services 525 can include a machine computation component that includes an artificial intelligence (e.g., machine learning) algorithm that develops and utilizes a predictive model. Third party machine analytics services 527 may also be utilized by platform 500.

Agent management services 555 is for managing all aspects of human interaction and judgment aggregation. The agent management services 555 can include a platform that queries a pool of agents to process a task by, for example, answering a question regarding sensor data.

Task director services 520 is responsible for progressing the state of a task, starting a task upon proper initiation triggers and determines when a task is completed for reporting. The task director services 520 serves as the director of various processing tasks and requests processing of task by, and receiving the results of processing from, the machine analytics services 525 and agent management services 555.

Within the platform 500 a task can be an instance of a modality in progress, which can include a solution state machine object. As the modality is a definition of the problem objective, the solution state machine is the “object” that maintains the state of processing for every trigger event received from the assets. Tasks are the workload of the platform 500. They can drive events and processing, and ultimately will end up as successful (accomplished the modality and satisfied the customer's requirements) or failed (did not accomplish the modality). All kinds of tasks can be in motion at any time within the platform 500 and the event-driven nature of the platform 500 can continuously move tasks toward a final state as new information becomes available from components.

Reports are the data results generated by participants against a specific task at a specific state. The task director 520 listens for all reports and uses the data in the report to determine the next state of the task. So, for example, if a task enters a NEED_AI state, there may be multiple machine computation components that may start going to work to solve the current task. When each machine computation component has something report back, it will create a report and publish it to a reports queue. Task director 520 will get these reports and use the measurement data in them to determine next steps for the task.

The role of the alerts messaging service 545 or escalation manager is to look at every successful “match” produced by the platform and determine the appropriate means of distributing that information out to the customer. Depending on how the customer has configured their project, they may wish to receive immediate alerts to one or more cell phones, or they may wish to have their internal system directly updated with the result information, or they may want both. In any of these cases, it is the escalation manager's 545 job to perform the proper routing of results to the customer.

Platform 500 uses escalation policies to help direct what should happen when results for tasks have been accumulated. The escalation manager 545 listens for results and then consults appropriate escalation policies to govern next actions. Escalation policies can fall under 2 types, alert and machine-to-machine. An alert policy governs what should happen upon a result to alert customers or customer representatives to the result. A machine-to-machine policy governs what should happen upon a result with respect to machine integration.

Alerts are push notifications to customers that indicate platform 500 has determined a security scenario has been solved according to the match solution state of the modality. An alert is specific to a modality and will only be triggered for orphan modalities. When an alert is triggered, an alert-type escalation policy is either created or checked to see if any previous alert has been acknowledged or not. Platform 500 will only send a new alert if all previous alerts have been acknowledged.

Machine-to-machine (M2M) is an integration strategy for having platform 500 relay results directly to a customer system (possibly in addition to any alerts). Unlike alerts, M2M escalations can occur regardless of the resultant state of the solution (MATCH, NOMATCH). There are 2 modes of M2M integration: direct M2M and web socket M2M.

Direct M2M is a mode of integration that implies that platform 500 is going to make direct HTTP POST call to the target system and submit the match results. The target URL is provided in either one of two ways—if the escalation data policy data payload map has a “callback” entry, then the URL is taken directly from that entry. If the escalation data policy data payload does not have a “callback” entry, then it is assumed the callback URL is given as meta-data with the artifact that was sent by the sensor/camera. In this case, the asset must add the correct “callback” metadata to the artifact upload. Note that in either form of the DIRECT M2M mode, the callback URL must be accessible as a public service endpoint. This may mean that firewall port forwarding or other techniques should be employed to allow traffic to flow from platform 500 to a target system. If it is not possible to provide public accessibility to the target system callback endpoint, then the WEB SOCKET M2M mode should be used instead.

Web socket M2M is a mode of integration that utilizes a platform 500 provided tool (MosaiqM2MRelay) which will relay M2M messages from platform 500 to the target system using a web socket connection to platform 500. The MosaiqM2MRelay application must be run from within the internal network and must have direct accessibility to the desired target callback URLs. It will create a private and secure web socket connection to platform 500, through which, any M2M messages for the target system will be relayed.

The target callback URL can be provided the exact same way as in the DIRECT M2M mode, however, the difference is that instead of POSTing directly to the callback from platform 500, the request is directed through a web socket to the MosaiqM2MRelay, which then proceeds to directly call the target callback URL. The resultant POST request on the target system can be exactly the same in either case.

In order to speed integration efforts to platform 500 for external, as well as internal participants, a platform SDK can be provided in multiple languages in order to abstract away from the developer some of the core and necessary logic and communications. Some of these core components to include in and SDK can be: lifecycle events, event messaging publications/subscribing, persistent data access, and the like.

Each participant lifecycle can include startup and shutdown events that will signal to others in the platform 500 that a new capability is now available or is now leaving. This registers the participant for uptime monitoring by a monitoring manager.

Flow controls are the mini-workflows captured within a modality. A flow control can be as simple or complex as needed to implement a particular modality. Generally, modalities can then be combined to form scenarios. Flow controls are executed within the scope of a task.

A flow control can be a state machine. Upon transitioning to a state, generally, the task director 520 can publish the new state to the event messaging system 505. Interested platform 500 participants can then accept the state data and eventually publish reports back to the system. Task director 520 can listen for these reports and applies them to the flow control to see if it can transition to a new state. Transitioning is defined in the flow control definition as a collection of measurement criteria sets. Each criteria set is applied to the report measurement groups to see if there is an exact match. The first exact match will initiate the transition rule for that criteria set (which is a new state transition).

A fork/join is a special type of flow control construct in which a single input is forked into multiple outputs, and then those multiple outputs are joined into a single flow again. The task director 520 does not proceed to the next step until all reports have been received and accumulated. There are two types of fork/join nodes: fork implicit and fork explicit.

The fork implicit type of fork node definition specifies a single fork transition step, which will be executed concurrently for each subgroup of the state input. A use case of this can be to execute certain logic for each measurement group. The task director 520 creates independent event publications for each sub-grouping and will coordinate the reports into a join continuation. A fork implicit can be defined as a single transition node (the logic node that we want each subgroup to execute), with a single measurement criteria called “groupBy”. The value of the “groupBy” measurement will define how the node should create its implicit subgroups. The value of the “groupBy” measurement will define how the node should create its implicit subgroups. Two types of sub-grouping can be supported, namely, “measurementGroup” and “artifactGroup”.

Once the group of individual tasks that will be forked is determined, the forking logic can also look for a special “artifact_roi” key in the state flow data. If this key is found, then it represents a measurement name that contains an ImageROI definition in each group. Task director 520 can crop this region of interest from the original artifact and use it as the primary artifact of the forked task. If there is no “artifact_roi” key or if the measurement cannot be found on a specific group, then the forking artifact is used for that task. Thus helper logic can be utilized when some region of interest artificial intelligence has run upstream of the forking to determine the regions on the artifact.

The fork explicit type of fork node definition specifies explicit fork transitions for all the input data to be executed concurrently. A use case of this can be to execute multiple independent forms of logic on the same input set (e.g., different AI classifiers). Task director 520 can create independent events and passes the same inputs to each one as defined by the flow control.

Every fork/join node has a corresponding join node. The join node is denoted in the fork definition as the last transition in the list and has an empty list of criteria. When all reports have been accumulated for a fork, then the join node is executed. There are 2 types of join nodes: join aggregate and join select.

Join aggregate type of join node aggregates all of forked tasks and adds a special set of report measurements to indicate the number of reason states for all the tasks. So, for example, the final join report contains “joined:END_SUCCESS”, “joined:END_FAIL”, and the like. These can be used to decide next steps of the aggregation in the flow control. Additionally, the flow can provide a “filter” state flow data key which can be set to either “MATCH” or “NOMATCH”, which will filter all the final results to only those aggregated tasks that have that code. Join select type of join can select a single report from the forked reports using the specified criteria.

Every concurrent execution path for a fork node can execute its own independent flow control. These inner flow controls can begin and end like any flow control with correct end states. When an end state is reach for the inner control flow, then that entire path is now marked as ready for joining. When all forked inner flows complete, the task director 520 can execute the join node logic and continue the flow.

A flow control can instruct an executing task to spawn another task for continuing the flow. In this case, the task that spawned the new task is completed with an END_SPAWNED reason. A flow can spawn a new task if the artifacts and/or goals of the flow have changed from when the original task started. For example, if a flow begins from an initial task with an image artifact and, through the flow, new artifacts are created as a grouping of regions from the original image. The flow can spawn a new task whose artifact is the new artifact group for further processing, rather than the original single image artifact. In this case, spawning preserves the original task for display (with a END_SPAWNED) reason, and can continue the workflow on the new task.

More than one flow control can be used. For example, the following are example flow control definitions.

Name Description auto- Immediately end in success. success- Final artifact is the exact input artifact flow iq-verify- Immediately requests agent computation. Answers from agent computation flow are verified against the modality match answers. match with confidence > .8 is ended successful non-match with confidence > .8 is ended failed any other result with confidence <= .8 is ended no_confidence Final artifact (upon success) is the exact input artifact iq-multi- Expects to receive an input task that should be spawned into multiple agent verify-flow computation tasks based on either a measurementGroup or artifactGroup grouping. Each individual spawned task will execute their own flow to completion. All answers for all tasks are verified against the modality match answers. match with confidence > .8 is ended successful non-match with confidence > .8 is ended failed any other result with confidence <= .8 is ended no_confidence Final artifact for each task (upon success) is the artifact created as part of the fork logic iq-annotate- Immediately requests agent computation. Answers are not verified against flow any match answers. any answer with confidence >= .25 is ended successful any answer with confidence < .25 is ended no_confidence Final artifact (upon success) is the exact input artifact ai-verify- Immediately request AI. Answers are verified against the modality match flow answers. match with confidence > .8 is ended successful non-match with confidence > .8 is ended failed any other result with confidence <= .8 is ended no_confidence Final artifact (upon success) is the exact input artifact ai-detect- Initially performs AI event detection and gathers regions of interest (ROI) and-isolate- changes. For each ROI, requests AI, whose answers are verified against the flow modality match answers. match with confidence >= .90 is successful and the flow adds this report to the joined node non-match with confidence >= .90 is failed and the flow does not add the report to the joined node any other result with confidence < .9 is ended no_confidence and the flow does not add the report to the joined node If the number of successful reports at the joined node > 0, then the success report artifacts (ROIs) are grouped into a group artifact and a final task is spawned with an immediate end success for the entire modality. If the number of successful reports at the joined node == 0, then the modality is finished immediately as failed. Final artifact (upon success) is a grouped artifact consisting of the successfully classified ROIs ai-iq-detect- Initially performs AI event detection and gathers regions of interest (ROI) and-isolate- changes. For each ROI, requests AI, whose answers are verified against the flow modality match answers. If the AI has insufficient confidence in it's answer, then the task is sent to agent computation to validate the answer. match with confidence >= .95 is successful and the flow adds this report to the joined node match with confidence < .95 is sent to agent computation component for confirmation of the answer any non-match result is ended failed and the flow does not add the report to the joined node For the NEED_IQ confirmation: match with confidence >= .90 is successful and the flow adds this report to the joined node non-match with confidence >= .90 is failed and the flow does not add the report to the joined node any other result with confidence < .90 is ended no_confidence and the flow does not add the report to the joined node If the number of successful reports at the joined node > 0, then the success report artifacts (ROIs) are grouped into a group artifact and a final task is spawned with an immediate end success for the entire modality. If the number of successful reports at the joined node == 0, then the modality is finished immediately as failed. Final artifact (upon success) is a grouped artifact consisting of the successfully classified ROIs ai-detect- Initially performs AI event detection, and then sends the ROIs to the next verify-flow step of AI classification using the classifier on the same artifact. Each classified ROI is verified against the modality match answers for the first match. match with confidence >= .90 is successful and the modality ends successful non-match with confidence >= .90 is failed and the modality end failed any other result with confidence < .90 continues the flow to the next step The next step, if reached, is to request agent computation component for the initial artifact. The answer is verified against the modality match answers. match with confidence >= .80 is successful and the modality ends successful non-match with confidence >= .80 is failed and the modality ends failed any other result with confidence < .80 is failed and the modality ends no_confidence Final artifact (upon success) is the exact input artifact

FIG. 7 is a data flow diagram illustrating data flow between components of platform 500 during a process of augmenting artificial intelligence with human intelligence tasks, for example, as described with reference to FIG. 1. At 705, the task director 520 receives sensor data. The task director can receive the sensor data using the event messaging system 505. The task director can determine whether a predefined modality exists for the asset from which the sensor data originated. At 710, the task director 520 can send a request for a predefined modality from the data manager 530. Data manger 530 can retrieve the predefined modality from a database and, at 715, provide the task director 520 with the predefined modality.

At 720, task director 520 can instantiate the solution state machine that is specified by the predefined modality. The solution state machine can have a number of states. Task director 520 can effectuate and direct processing flow as specified by the solution state machine. By way of example, the remainder of the description of FIG. 7 assumes the predefined modality specifies the example solution state machine illustrated in FIG. 2. The solution state machine is in the initial state “S”, so task director 520 transitions the current state of the solution state machine according to the transition rules, which results in the solution state machine having a current state of “MI”. “MI” state is associated with a machine computation component, which in platform 500 can be machine analytics 525. At 725, task director 520 requests processing of the task by machine analytics 525. Machine analytics 525 can process the task, for example, by performing image processing and classifying the image. At 730, machine analytics 525 can send the result of its processing of the task to task director 520, which can receive the results. The results can include a confidence of the machine analytics 525 result.

At 735, task director 520 can transition the state of the solution state machine. For the example solution state machine illustrated in FIG. 2, the current state of the solution state machine can transition to either “MI2” or “HI” states depending on the confidence value returned by machine analytics 525. Assuming the task is one that is challenging for an artificial intelligence algorithm to solve, and the confidence value returned by the machine analytics is low (e.g., 0.2), then task director 520 can apply the transition rules (“C<0.3”) and transition the solution state machine to the “HI” state.

At 740, task director can request agent management services 555 to perform processing on the task. Agent management services 555 can receive the prior processing result. Agent management service 555 can query a pool of agents by submitting the sensor data and the agent form contained in the predefined modality to one or more of the agents. Agent management service 555 can receive the completed agent form from the agent (e.g., a client associated with the agent). Agent management service 555 can create a composite agent result where more than one agent is queried and can determine a composite confidence measure. At 745, agent management service 555 can send the query result and confidence measure to task director 520.

At 750, task director 520 can advance the current state of the solution state machine. In the case that the confidence measure received from agent management service 555 is not definitive (e.g., 0.5), task director can apply the transition rules (e.g., 0.9<C<0.4) and transition the solution state machine to state “MI2”, which is associated with another machine computation component.

Task director 520 can, at 755, request processing of the task by the machine analytics 525 component. Machine analytics 525 can process the task, for example, by performing image processing and classifying the image. The underlying artificial intelligence system used can be a different system than that used in steps 725 and 730. In some implementations, the underlying artificial intelligence system used can be the same but can use the prior agent management 555 result and/or the prior machine analytics 525 result. In this manner, machine analytics 525 can either try a new approach (e.g., an ensemble) or refine previous results.

At 760, machine analytics 525 can send the result of its processing of the task to task director 520, which can receive the results. The results can include a confidence of the machine analytics 525 result.

At 765, task director 520 can transition the state of the solution state machine. Assuming the machine analytics 525 result was a high confidence (e.g., 0.95), task director 520 can transition the solution state machine to the terminal state “ES”, which signifies that the task is completed with high confidence and so the task processing has been successful.

At 770, task director 520 can provide the outcome of the task processing, which can include whether or not platform 500 was able to come to a high-confidence output and the classification, matching, or determination of the sensor data. (For example, task director 520 can provide whether the processing outcome is accurate and whether platform 500 detected the presence of a person in the sensor data image.)

While the data flow illustrated in FIG. 7 is described as having components of platform 500 send and/or receive data directly from each other, it should be understood that the sending and receiving can be via the event messaging system 505. Further, the event messaging system 505 is not the only protocol that can be implemented with the current subject matter.

Platform 500 can include a core set of common data structures. Any participant may define, store and exchange proprietary data that is not defined for its own purposes or others.

Platform 500 can provide a means of persisting long-lived data in such a way that is efficient, scalable and extensible to changes and evolution. Persistent data can be both general purpose and proprietary. General purpose data can be that which is meaningful and required by all participants within the platform, such as, customer and project specifications. General purpose data formats will be defined by platform 500 and APIs will be exposed for accessing and contributing data. Proprietary purpose data can be that which is specific to a particular participant and/or functionality which does not require sharing outside of its owner. An example of this can include agent performance/compensation records. Participants with proprietary data requirements can be responsible for establishing their own storage management.

Customer and project data can describe everything about a particular customer and project. A customer can have multiple projects. Each project can contain the information necessary to understand the raw input sources, the problem specification, the points of contact, and the like. A customer can have a unique identifier as well as each project within that customer. Hence, a customer with identifier C123 could have projects P1 and P2 and the canonical representation of C123::P1 will uniquely identify that project for that customer across the global platform 500 namespace.

Both static and dynamic information can be used. Static information can be the data representing the customer account, as well as, the problem specifications and details. Static data can be updated over time, but it is not generally going to change in the course of task execution. Dynamic data, on the other hand, can be continuously created and updated over the course of any task execution.

Raw data can include raw dumping (e.g., data lake) representation of all the runtime data generated within the platform. A purpose of this data is to provide a source for auditing flow streams to replay data or examine triggers that resulted in a particular result. Additionally, this data source can serve for deeper analytics to calculate solution precision/recall, as well as, serving as an endpoint for 3rd party integrations with project solutions and results.

Platform 500 can save both structured and unstructured data for future business intelligence. The persistence technology for this data can support both unstructured (or semi-structured) data and fully-structured data sets, which can support arbitrary querying and OLAP queries. Some technologies able to handle this type of data at large scale and minimal costs include HADOOP/SPARK; AWS REDSHIFT (OLAP); and MONGODB (NOSQL). Redshift provides a large amount of out-of-box integration capability with existing business intelligence tools, such as Tableau or Periscope. Hadoop/Spark provides a common architecture and toolset for data scientists to work with; underlying raw, unstructured data can be manipulated and transformed to produce structured insights and results which can then be visualized by business intelligence visualization tools.

Platform 500 can include scalable streaming middleware, which can handle buffering, batching and applying the data to the target persistence technology.

Multimedia data can be a source of multimedia resource data (e.g., video, images, sounds, and the like) that is presented to the platform 500 for processing. Resource data can be stored in native format, associated with a globally unique identifier and can be directly accessible by platform 500 participants. Platform 500 can provide an SDK and/or service API (e.g., multimedia manager) for storing and retrieving raw multimedia data. Internally these resources can be saved to persistent storage either by remote API calls or directly local file system. FIG. 8 is a block diagram illustrating some meta-data of the platform 500.

Although a few variations have been described in detail above, other modifications or additions are possible. For example, platform 500 can be cloud capable, as opposed to, cloud based. The purpose of this can be to leverage cloud technology and infrastructure as much as possible and when possible. When it is not possible, such as deployment within a secure facility or environments without internet accessibility, then all major core components of platform 500 can be executable and can operate normally without cloud access. Running platform 500 within a cloud infrastructure can provide benefits, including: virtually unlimited storage and compute processing, integration with other public services, centralized monitoring, detached resource dependencies and more. Running platform 500 within a non-cloud/local environment can require dedicated resources. One or more components of platform 500 can be internally within a customer facility and reach out to a larger, cloud hosted suite of platform 500 components for processing.

The following describes another example implementation of the current subject matter.

In some implementations, by including the human-in-the-loop the current subject matter is able to accomplish new kinds of tasks entirely (e.g., those that require human intelligence).

The current subject matter relates to utilizing a “human-in-the-loop” (symbiotic human-machine) approach in order to enable new capabilities of automated or non-automated machine decision systems by, for example, reducing false alarms associated with sensors and analytics as well as expand the range of use cases to which a machine decision making system or a given sensor and/or analytic may effectively apply. In some implementations, the current subject matter can provide for injection of a human-computation element into a machine decision-making algorithm, allowing for a human to perform (or solve) specific and narrow decisions that the machine decision making system would otherwise be unable to perform (or would perform poorly). The current subject matter can expand the range of use cases that a machine decision making system or a given sensor and/or analytic may effectively apply. The subject matter can be used with applications that do not current include machine decision-making algorithms, for example a closed circuit television system that currently does not have a machine decision-making algorithm. The current subject matter can enable new capabilities and improve machine decision making, for example, by improving performance of correct classification, which can provide one or more of reducing false alarms, increasing performance of detection (e.g., hit), increasing performance of correctly determining a miss, and increasing performance of determining a correct rejection.

FIG. 15 is a system block diagram illustrating an example system 1500 that provides for injection of a human-computation element into a machine decision-making algorithm. The system 1500 may include a sensor 1505, analytics 1510, controller 1515, user interface 1520, and human computation element 1525.

The sensor 1505 may include a variety of sensor types: imaging, acoustic, chemical, radiation, thermal, pressure, force, proximity, or a number of other sensor types. “Sensor,” as used herein may include information that did not originate specifically from physical hardware, such as a computer algorithm.

Analytics 1510 may include a wide range of software analytics and development processes, which are methods and techniques that typically rely on gathering and analyzing information from sensor 1505. Analytics 1510 may include, but are not limited to, face recognition, people counting, object recognition, motion detection, change detection, temperature detection, and proximity sensing. Analytics 1510 may address a user's query of the system 1500 (e.g., a face recognition analytic if the user desires to understand who is entering his or her building). It also may serve to reduce the amount of sensor information sent to the human computation element 1525, or the amount of bandwidth, memory, computation, and/or storage needed by the system 1500. In some configurations, the system output can be obtained at low latency, in real-time or near (e.g., substantially) real-time.

Controller 1515 may include a tool that utilizes the output and characteristics of the sensor 1505 and/or analytics 1510 in conjunction with internal logic and/or in conjunction with a predictive model of human and machine performance to determine whether and how to utilize human computation element 1525. Controller 1515 may determine that information generated by sensor 1505 and/or analytics 1510 is sufficient to answer a given user query or given task, or controller 115 may outsource certain tasks to humans (via human computation element 1525) based on system objectives and controller 1515 internal logic and/or a predictive model of human and machine performance. Controller 1515 may coordinate, via human computation element 1525, use of human intelligence to perform tasks that augment, validate, replace, and/or are performed in lieu of sensor 1505 and/or analytics 1510. Controller 1515 may be capable of collecting, interpreting, and/or integrating the results of human work into the machine decision making process and system. Controller 1515 may be capable of converting a user-defined task, that is either defined via natural language or via a more structured query, into a smaller task or series of smaller tasks, as it deems necessary, and into an output for an end user, using either sensor 1505 and/or analytics 1510 or human computation element 1525, or both.

In addition, controller 1515 may maintain statistics pertaining to the performance of sensor 1505 and/or analytics 1510 as well as human computation element 1525 and/or individual human workers or a subpopulations of workers. These statistics may be used to improve the means of utilizing machine and human elements of the pipeline. System 1500 may be capable of gathering data that may be useful for improving the performance characteristics of system 1500, sensor 1505 and/or analytics 1510, or the human computation element 1525. Typically these data are selected because they are examples for which the sensor 1505 and/or analytics 1510 have low relative certainty or they are examples that are informative for improving characteristics of sensor 1505 and/or analytics 1510.

Human computation component 1525 utilizes human intelligence. A purpose of human computation element 1525 may be to aid system 1500 in its ability to address AI-hard or AI-complete problems that are difficult or impossible to solve reliably and/or cost effectively with sensor 1505 and/or analytics 1510 (e.g., software analytic technology) alone. Another purpose of incorporating human intelligence may be to perform tasks that augment or validate the sensor 1505 and/or analytics 1510 of system 1500. One example of this is using humans to validate the output of a computer vision analytic via a micro task involving imagery. Human computation element 1525 may also aid in the translation of tasks received by users. Task translation may range from none (e.g., if the task is given directly to humans) to minimal (e.g., if the task is given partly to computers and partly to humans, would benefit from formalization, or is decomposed and then executed by either computers or humans) to substantial (e.g., if the system determines it may be able to improve its effectiveness by translating the task substantially). The system may distribute a task in order to manage and improve characteristics such as throughput, latency, accuracy, and cost. Humans may also contribute innovative solutions into the system 1500, make incremental changes to existing solutions, or perform intelligent recombination. Human computation element 1525 may function as part of an ongoing process, which may be aimed at real-time or near-real time applications as well as at applications that require results at lower frequencies. System 1500 may utilize a task market such as AMAZON® Mechanical Turk, but is built in such a way that it may also incorporate many different kinds of human workers worldwide via other crowd work platforms or via a custom system interface. Examples of other crowd workers may include employees of an enterprise, off-duty or retired law enforcement professionals, subject matter experts, or on-duty personnel. The system may include a process for establishing and verifying credentials of the crowd workers for the purpose of meeting system objectives or improving system efficiency. Incentives to participation may include monetary compensation, volunteerism, curiosity, increasing reputation/recognition, desire to participate in a game-like experience, other motivation sources, and the like.

The end user interface 1520 may include an interface that combines alerts with a human-like means of interaction.

System 1500 is a closed loop system that can use sensor 1505 and/or analytics 1510 performance characteristics as well as human inputs (from the human computation element 1525 or from an end user) to improve its underlying performance characteristics relative to the challenges (e.g., AI-hard or AI-complete problems) the system 1500 confronts. The system 1500 may incorporate a scheme for collecting useful “ground truth” examples that correspond to these challenges. Data collected by system 1500 may be used to improve system characteristics using machine learning or other statistical methods.

FIG. 16A is a process flow diagram illustrating a method 1600 of injecting human-computation into a machine decision-making algorithm, allowing for a human to perform (or solve) specific and narrow decisions that the machine decision making system would otherwise be unable to perform (or would perform poorly). The particular example application of FIG. 16A is to detect graffiti using a vapor sensor and an imaging sensor. At 1605, a user may define a question to be answered by the system. For example, a user may define a question regarding whether graffiti is occurring (or has occurred) and who may be involved in the graffiti.

At 1610, the question may be translated into something that can be addressed programmatically with hardware, software, and humans. For example, the pseudocode at Table 1 may be used, which enables the human in the loop to work alongside one or more sensors to aid in solving more complex tasks. In the example of table 1, if the vapor sensor is confident that there is nothing there, then end (no need to involve a human). If it is confident there is a high vapor condition, send a report (again, no need to involve a human). If there is medium confidence, ask the human in the loop to weigh in on the situation and inform the answer.

TABLE 1 if (vapor_sensor.ppm > 250) if (vapor_sensor.ppm > 750) if ( camera.person_holding_can ) sendreport( ) if ( camera.person_holding_can ) sendreport( ) If ( vapor_sensor < 50ppm) End If ( vapor_sensor > 750ppm) sendreport( ) If ( vapor_sensor is between 50ppm and 750ppm) Get_human_answer_on_whether_graffiti_occurring( )

At 1615, a sensor (e.g., sensor 1505) assess the situation (takes a measurement) and makes a decision (or guess), for example, low, medium, or high levels of vapor. For example, for vapor_sensor.ppm, sensor is making its best guess as to whether a condition (detecting a vapor level associated with spray paint) exists. If the decision is negative (no vapors) then no graffiti is occurring and the assessment may terminate. If the decision is a medium level of vapors, there may or may not be graffiti, and human computation element 1525 may be employed, at 1620, to inject a human decision or review of the sensor assessment. The human may review the sensor data and render a decision regarding whether graffiti is occurring. The high, medium, or low assessment by the sensor may be a function of the receiver operating characteristics (ROC) of the sensor and may vary.

If the human-decision indicates that graffiti is occurring, or the vapor sensor indicates with high reliability that vapor is present and so graffiti is occurring (so that no human input is required), at 1625 a second sensor, such as an imaging sensor, can assess the situation (e.g., take a measurement). FIG. 16B illustrates an example image containing graffiti. The imaging sensor may also render a decision with low, medium, and/or high likelihood that the imaging sensor has identified who is creating the graffiti. Like with the vapor sensor, if the imaging sensor is confident in its determination, the system may proceed directly to 1630, where a report can be issued or no action taken. However, if the imaging sensor renders a decision with low confidence, at 1625, human-computation element 1525 may be used to allow a human make the determination. The human may also weigh in using data from the vapor sensor and imaging sensor, if the vapor sensor couldn't benefit from human insight by itself or if it is costless to engage the imaging sensor.

Thus, the example method 1600 allows for adaptive behavior based on the confidence of decisions (or assessments) made by sensors. For example, if vapor sensor and imaging sensor both render confident results, the process may involve machine only decision making; if either vapor sensor or imaging sensor renders a not confident result (e.g., increased likelihood of an incorrect decision) then a human computation element may be injected into the machine decision loop to render a decision. The method may close the loop and allow human-generated ground truth to improve the algorithms used to process sensor data, the confidence threshold for each sensor, the weight of each sensor's information in the overall solution, and more.

FIG. 17 is a system block diagram illustrating an example implementation of the current subject matter for video/face recognition system 1700. The face recognition system 1700 may be able to determine whether people on a “watch list” (or people who belong to any notable subpopulation such a very important persons (VIPs), frequent shoppers, security threats, and the like) are entering a given facility. The face recognition system includes a video sensor 1705, an image analysis analytics 1710, a controller 1715, user interface 1720, human-computation element 1725, and learner 1730.

The video sensor 1705 can acquire images (for example, of a person), which are analyzed by the image analysis analytics 1710, which can generate a determination whether a person in the image is on the watch list. The controller 1715 can receive the decision and, based on a measure of confidence of the decision, determine whether to employ the human computation element 1725 to verify the decision. If the human computation element 1725 is employed, using, for example, Mechanical Turk or similar service, a human will review the image with possible candidates from the watch list to determine if there is a match. When the face recognition analytics 1710 is incorrect (as identified by the human-computation element 1525), the human analysis of the mistake 1735 may input to a learner 1730, which can use the data point to train the face recognition analytics 1710 to further train the face recognition analytics 1710 and improve performance. Thus, the human computation element aids in improving the performance of the machine element over time and provides feedback.

FIGS. 18 and 19 are process flow diagrams illustrating using the current subject matter for face recognition and using the face recognition system 1700.

The system describe in FIG. 17 may not be limited to face recognition. For example, the system may be used to handle a wide variety of tasks, such as counting sports utility vehicles (SUVs) in a parking lot or validating computer vision analytic performance as shown in FIGS. 20 and 21.

In some implementations, the current subject matter may incorporate an automatic enrollment process whereby a user may contribute examples of data that are either positive examples, negative examples, or examples that are necessary for effective system operation. The current subject matter may efficiently solicit, gather, and catalogue these data. For instance, in the case of face recognition, users may contribute images of people whom they desire to identify, and the current subject matter may gather and catalogue these face images. These images may be used to train analytics and/or humans as well as to guide system outputs according to the system's internal logic and the need expressed by the user.

Configuration examples may include:

Systems addressing physical security, safety, or asset protection needs.

Systems addressing the improvement and/or monitoring of retail environments.

Systems addressing real-time sensor feeds.

Systems addressing historic sensor feeds.

Systems incorporating multiple sensors.

Systems addressing residential, education, medical, financial, entertainment, industrial, transportation, commercial, law enforcement, military, or governmental applications.

FIG. 22 is a block diagram illustrating an example of hardware 2200 used by the current subject matter, which may include one or more sensors coupled with a CPU and/or GPU. The device may perform a portion of its processing locally (onboard device) and a portion of its processing remotely (e.g., using cloud-based computation). This computational scheme may be in place in order to efficiently utilize bandwidth, storage, and device memory, while facilitating the efficient implementation of the aforementioned human-in-the-loop process. The hardware may be designed in such a way that additional sensors are readily supported via a bus-modular system approach. In addition, the hardware incorporates a means to communicate through a network, such as WiFi or Cellular network.

Although a few variations have been described in detail above, other modifications or additions are possible. For example, the current subject matter is not limited to the security domain, but can extend to voice recognition and other domains including physical security, safety, or asset protection; the improvement and/or monitoring of retail environments; real-time sensor feeds; historic sensor feeds; multiple sensors; residential, education, medical, financial, entertainment, industrial, transportation, commercial, law enforcement, military, and/or governmental applications.

One or more aspects or features of the subject matter described herein can be realized in digital electronic circuitry, integrated circuitry, specially designed application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) computer hardware, firmware, software, and/or combinations thereof. These various aspects or features can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which can be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device. The programmable system or computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

These computer programs, which can also be referred to as programs, software, software applications, applications, components, or code, include machine instructions for a programmable processor, and can be implemented in a high-level procedural language, an object-oriented programming language, a functional programming language, a logical programming language, and/or in assembly/machine language. As used herein, the term “machine-readable medium” refers to any computer program product, apparatus and/or device, such as for example magnetic discs, optical disks, memory, and Programmable Logic Devices (PLDs), used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term “machine-readable signal” refers to any signal used to provide machine instructions and/or data to a programmable processor. The machine-readable medium can store such machine instructions non-transitorily, such as for example as would a non-transient solid-state memory or a magnetic hard drive or any equivalent storage medium. The machine-readable medium can alternatively or additionally store such machine instructions in a transient manner, such as for example as would a processor cache or other random access memory associated with one or more physical processor cores.

To provide for interaction with a user, one or more aspects or features of the subject matter described herein can be implemented on a computer having a display device, such as for example a cathode ray tube (CRT) or a liquid crystal display (LCD) or a light emitting diode (LED) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user may provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback, such as for example visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including, but not limited to, acoustic, speech, or tactile input. Other possible input devices include, but are not limited to, touch screens or other touch-sensitive devices such as single or multi-point resistive or capacitive trackpads, voice recognition hardware and software, optical scanners, optical pointers, digital image capture devices and associated interpretation software, and the like.

In the descriptions above and in the claims, phrases such as “at least one of” or “one or more of” may occur followed by a conjunctive list of elements or features. The term “and/or” may also occur in a list of two or more elements or features. Unless otherwise implicitly or explicitly contradicted by the context in which it is used, such a phrase is intended to mean any of the listed elements or features individually or any of the recited elements or features in combination with any of the other recited elements or features. For example, the phrases “at least one of A and B;” “one or more of A and B;” and “A and/or B” are each intended to mean “A alone, B alone, or A and B together.” A similar interpretation is also intended for lists including three or more items. For example, the phrases “at least one of A, B, and C;” “one or more of A, B, and C;” and “A, B, and/or C” are each intended to mean “A alone, B alone, C alone, A and B together, A and C together, B and C together, or A and B and C together.” In addition, use of the term “based on,” above and in the claims is intended to mean, “based at least in part on,” such that an unrecited feature or element is also permissible.

The subject matter described herein can be embodied in systems, apparatus, methods, and/or articles depending on the desired configuration. The implementations set forth in the foregoing description do not represent all implementations consistent with the subject matter described herein. Instead, they are merely some examples consistent with aspects related to the described subject matter. Although a few variations have been described in detail above, other modifications or additions are possible. In particular, further features and/or variations can be provided in addition to those set forth herein. For example, the implementations described above can be directed to various combinations and subcombinations of the disclosed features and/or combinations and subcombinations of several further features disclosed above. In addition, the logic flows depicted in the accompanying figures and/or described herein do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Other implementations may be within the scope of the following claims.

Claims

1. A method comprising:

receiving sensor data;

classifying the sensor data into a first class by at least requesting processing of a machine computational component, receiving a first result of the machine computation component, requesting processing of an agent computation component, and receiving a first result of the agent computation component, the agent computation component including a platform to query an agent;

classifying the sensor data into a second class by at least requesting processing of the machine computational component, receiving a second result of the machine computation component, requesting processing of the agent computation component, and receiving a second result of the agent computation component,

applying a set of rules to the first class and the second class to enable a determination of a composite classification; and

providing the composite result.

2. The method of claim 1, wherein processing of the agent computation component is requested when a confidence of the machine computation component result is below a first threshold.

3. The method of claim 2, wherein processing of the agent computational component is requested when the confidence of the machine computation component result is above a second threshold.

4. The method of claim 1, wherein the set of rules includes matching sensor data within a predetermined time-window.

5. The method of claim 1, wherein the providing includes requesting further processing of the machine computation component result by the agent computation component.

6. The method of claim 1, wherein the machine computation component includes a deep learning artificial intelligence classifier.

7. The method of claim 6, wherein the machine computation component detects objects and classifies objects in the sensor data, the sensor data including an image.

8. The method of claim 1, wherein at least one of the receiving, classifying, and providing is performed by at least one data processor forming part of at least one computing system.

9. The method of claim 1, wherein the sensor data includes a first image of a first security system asset and a second image of a second security system asset

10. A method comprising:

receiving first sensor data of a first security system asset and second sensor data of a second security system asset;

accessing a first predefined modality associated with the first security system asset and a second predefined modality associated with the second security system asset, the first modality defining a first computational task for analyzing the received first sensor data, the second modality defining a second computational task for analyzing the received second sensor data;

instantiating a first solution state machine object and a second solution state machine object, the first solution state machine object having a plurality of states and rules for transitioning between the plurality of state, the plurality of states including an initial state, a first intermediate state, a second intermediate state, and a terminal state;

determining a result of the first task and a result of the second task by executing the first task using the first solution state machine object and the second task using the second solution state machine object, the executing including requesting processing of the first task by a machine computation component and an agent computation component;

determining a composite result by applying a set of rules to the result of the first task and the result of the second task; and

providing the composite result.

11. The method of claim 10, wherein the executing of the first task includes:

requesting processing of the first task by, and receiving a result of, the machine computation component when a current state of the first solution state machine object is the first intermediate state, the result received from the machine computation component including a first confidence measure;

requesting processing of the first task by, and receiving a result of, the agent computation component when the current state of the first solution state machine object is the second intermediate state, the result received from the agent computation component including a second confidence measure; and

transitioning the current state of the first solution state machine object according to the transition rules and at least one of: the first confidence measure and the second confidence measure.

12. The method of claim 10, wherein the machine computation component executes a machine learning algorithm to perform the task.

13. The method of claim 10, wherein the machine computation component includes a deep learning neural network or a convolutional neural network.

14. The method of claim 10, wherein the agent computation component includes a platform that queries at least one agent, receives a query result, determines a confidence measure of the agent, and determines the second confidence measure using the confidence measure of the queried agent.

15. The method of claim 10, wherein the sensor data includes an image including a single image, a series of images, or a video; and the computational task includes: detecting a pattern in the image; detecting a presence of an object within the image; detecting a presence of a person within the image; detecting intrusion of the object or person within a region of the image; detecting suspicious behavior of the person within the image; detecting an activity of the person within the image; detecting an object carried by the person, detecting a trajectory of the object or the person in the image; a status of the object or person in the image; identifying whether a person who is detected is on a watch list; determining whether a person or object has loitered for a certain amount of time; detecting interaction among person or objects; tracking a person or object; determining status of a scene or environment; determining the sentiment of one or more people; counting the number of objects or people; determining whether a person appears to be lost; determining whether an event is normal or abnormal; and/or

determining whether text matches that in a database.

16. The method of claim 10, wherein the security system asset is an imaging device, a video camera, a still camera, a radar imaging device, a microphone, a chemical sensor, an acoustic sensor, a radiation sensor, a thermal sensor, a pressure sensor, a force sensor, or a proximity sensor.

17. The method of claim 10, wherein the modality defines: solution state machine object attributes, acceptable confidence for reaching the terminal state, a set of assets that trigger the modality, and agent query structure.

18. The method of claim 10, wherein executing the task includes posting, via a messaging queuing protocol, requested processing tasks, and wherein the machine computation component and agent computation component are microservices operating on tasks posted via the messaging queue protocol.

19. The method of claim 10, further comprising:

modifying a predictive model of the machine computation component using the result received from the agent computation component as a supervisory signal and the received sensor data as input.

20. The method of claim 10, wherein at least one of the receiving, accessing, instantiating, executing, and providing is performed by at least one data processor forming part of at least one computing system.

21. A non-transitory computer program product which, when executed by at least one data processor forming part of at least one computer, result in operations comprising:

receiving sensor data;

classifying the sensor data into a first class by at least requesting processing of a machine computational component, receiving a first result of the machine computation component, requesting processing of an agent computation component, and receiving a first result of the agent computation component, the agent computation component including a platform to query an agent;

classifying the sensor data into a second class by at least requesting processing of the machine computational component, receiving a second result of the machine computation component, requesting processing of the agent computation component, and receiving a second result of the agent computation component,

applying a set of rules to the first class and the second class to enable a determination of a composite classification; and

providing the composite result.

22. The computer program product of claim 21, wherein processing of the agent computation component is requested when a confidence of the machine computation component result is below a first threshold.

23. The computer program product of claim 22, wherein processing of the agent computational component is requested when the confidence of the machine computation component result is above a second threshold.

24. The computer program product of claim 21, wherein the set of rules includes matching sensor data within a predetermined time-window.

25. The computer program product of claim 21, wherein the providing includes requesting further processing of the machine computation component result by the agent computation component.

26. The computer program product of claim 21, wherein the machine computation component includes a deep learning artificial intelligence classifier.

27. A system comprising:

a media processor that receives sensor data;

means for classifying the sensor data into a first class by at least requesting processing of a machine computational component, receiving a first result of the machine computation component, requesting processing of an agent computation component, and receiving a first result of the agent computation component, the agent computation component including a platform to query an agent;

means for classifying the sensor data into a second class by at least requesting processing of the machine computational component, receiving a second result of the machine computation component, requesting processing of the agent computation component, and receiving a second result of the agent computation component; and

means for applying a set of rules to the first class and the second class to enable a determination of a composite classification.

28. The system of claim 27, wherein the set of rules includes matching sensor data within a predetermined time-window.

29. The system of claim 27, wherein the sensor data includes a first image of a first security system asset and a second image of a second security system asset.