WEAPON DETECTION AND TRACKING

Info

Publication number: 20210209402
Type: Application
Filed: Dec 31, 2020
Publication Date: Jul 8, 2021
Applicant: AlgoLook, Inc. (Richardson, TX)
Inventors: David E. Harrison (Frisco, TX), Jeremy C. Patton (Sachse, TX), John Dankovchik (Plano, TX)
Application Number: 17/139,673

Abstract

A method detects and tracks weapons. A frame of a video is received from a camera. A weapon in the frame detected using a weapon detection model. The weapon from the frame is classified using a weapon match classifier. A weapon alert is generated in response to classifying the weapon. The video is presented in response to the weapon alert.

Description

Description

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Provisional Application No. 62/956,867, filed on Jan. 3, 2020, which is hereby incorporated by reference herein.

BACKGROUND

Cameras capture images and video of people in environments. A challenge is to detect weapons from the images using automated systems.

SUMMARY

In general, in one or more aspects, the disclosure relates to a a method that detects and tracks weapons. A frame of a video is received from a camera. A weapon in the frame detected using a weapon detection model. The weapon from the frame is classified using a weapon match classifier. A weapon alert is generated in response to classifying the weapon. The video is presented in response to the weapon alert.

In general, in one or more aspects, the disclosure relates to a system that includes a server and an application. The server includes one or more processors and one or more memories. The application executes on one or more processors of the server, configured for detecting and tracking weapons. A frame of a video is received from a camera. A weapon in the frame detected using a weapon detection model. The weapon from the frame is classified using a weapon match classifier. A weapon alert is generated in response to classifying the weapon. The video is presented in response to the weapon alert.

In general, in one or more aspects, the disclosure relates to a method that trains and uses machine learning models to detect weapons. A machine learning model that incorporates a weapon detection model to detect weapons from training data is trained. A weapon in a frame of a video is detected using the weapon detection model. The weapon from the frame is classified using a weapon match classifier. A weapon alert is generated in response to classifying the weapon. The video is presented in response to the weapon alert.

Other aspects of the invention will be apparent from the following description and the appended claims.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1A, FIG. 1B, FIG. 1C, FIG. 1D, FIG. 1E, and FIG. 1F show diagrams of systems in accordance with disclosed embodiments.

FIG. 2, FIG. 3A, FIG. 3B, FIG. 3C, and FIG. 3D show flowcharts in accordance with disclosed embodiments.

FIG. 4A and FIG. 4B show examples in accordance with disclosed embodiments.

FIG. 5A and FIG. 5B show computing systems in accordance with disclosed embodiments.

DETAILED DESCRIPTION

Specific embodiments of the invention will now be described in detail with reference to the accompanying figures. Like elements in the various figures are denoted by like reference numerals for consistency.

In the following detailed description of embodiments of the invention, numerous specific details are set forth in order to provide a more thorough understanding of the invention. However, it will be apparent to one of ordinary skill in the art that the invention may be practiced without these specific details. In other instances, well-known features have not been described in detail to avoid unnecessarily complicating the description.

Throughout the application, ordinal numbers (e.g., first, second, third, etc.) may be used as an adjective for an element (i.e., any noun in the application). The use of ordinal numbers is not to imply or create any particular ordering of the elements nor to limit any element to being only a single element unless expressly disclosed, such as by the use of the terms “before”, “after”, “single”, and other such terminology. Rather, the use of ordinal numbers is to distinguish between the elements. By way of an example, a first element is distinct from a second element, and the first element may encompass more than one element and succeed (or precede) the second element in an ordering of elements.

In general, embodiments of the disclosure train and use machine learning models to detect and track weapons from images captured by camera systems. The system queues video frames (also referred to as frames) into message queues. Weapon detection workers (programs) process the images to detect presence of weapons in the images using machine learning models. Face detection and identification workers (programs) process the images to detect and identify faces of people in the images. When a weapon is detected or an unknown person is detected, the system may generate alerts to identify the presence of weapons and unknown persons.

FIGS. 1A, 1B, 1C, 1D, 1E, and 1F show diagrams of embodiments that are in accordance with the disclosure. FIG. 1A shows the weapon detection application (104), which uses machine learning models to detect weapons. FIG. 1B shows the camera pipeline application (103), which processes video. FIG. 1C shows the face detection application (105), which uses machine learning models to detect faces. FIG. 1D shows the face identification application (106), which uses machine learning models to identify faces. FIG. 1E shows the training application (108), which trains machine learning models to detect and identify weapons and faces. FIG. 1F shows the system (100), which trains and uses machine learning models to detect, identify and track, weapons and faces in video. The embodiments of FIGS. 1A, 1B, 1C, 1D, 1E, and 1F may be combined and may include or be included within the features and embodiments described in the other figures of the application. The features and elements of FIGS. 1A, 1B, 1C, 1D, 1E, and 1F are, individually and as a combination, improvements to the technology of machine learning. The various elements, systems, and components shown in FIGS. 1A, 1B, 1C, 1D, 1E, and 1F may be omitted, repeated, combined, and/or altered as shown from FIGS. 1A, 1B, 1C, 1D, 1E, and 1F. Accordingly, the scope of the present disclosure should not be considered limited to the specific arrangements shown in FIGS. 1A, 1B, 1C, 1D, 1E, and 1F.

Turning to FIG. 1A, the weapon detection application (104) is a set of programs that may operate as part of the server application (102) (of FIG. 1F) to generate weapon alerts in response to detecting weapons in video frames from videos. The weapon detection application (104) includes the weapon detection worker (130).

The weapon detection worker (130) is a program that is part of the weapon detection application (104). The weapon detection worker (130) detects and classifies weapons within the video frame (131) (also referred to as a frame of video). The video frame (131) may be from the camera video stream (145) (of FIG. 1B) (also referred to as the video). The weapon detection worker (130) receives the video frame (131), detects weapons in the video frame (131) with the weapon detection model (132), extract weapon images from the video frame (131) using the weapon image extractor (133), and classifies weapons from the weapon images using the weapon match classifier (135).

The weapon detection model (132) detects the presence of weapons in weapon images. In one embodiment, the weapon detection model (132) extracts data from the video frame (131) into a multidimensional multichannel array that is input to a convolutional neural network (CNN). The convolutional neural network outputs the detection of a weapon within the image. The output of the convolutional neural network may also identify the location of the weapon within the image. The output of the weapon detection model (132) is sent to the weapon image extractor (133). In one embodiment, the weapon detection worker (130) may use a frame section generator to split the video frame (131) into smaller sections that are processed by the weapon detection model (132).

The weapon image extractor (133) receives the output from the weapon detection model (132) and generates the weapon image (134). In one embodiment, the output from the weapon detection model (132) identifies the location of the weapon shown in the video frame (131). The weapon image extractor (133) may crop the edges of the video frame (131) down to the location of the weapon within the video frame (131) and extract the weapon image (134) from the cropped video frame (131). The weapon image (134) is output from the weapon image extractor (133) and input to the weapon match classifier (135).

The weapon match classifier (135) performs a second look to determine whether a weapon is present in the video frame (131). The weapon match classifier (135) receives the weapon image (134), extracted from the video frame (131) and classifies the weapon image (134) against a plurality of vantage point trees (also referred to as trees). Vantage point trees use a vantage point tree algorithm to organize data. For example, a distance function is used to identify the distance between elements of a tree. Branches to one side of the tree indicate that the parent node and child node are closer than the vantage point threshold distance and branches to another side indicate that the image node and the child node are further apart from each other than the vantage point threshold distance.

The vantage point trees include the pistol tree (137), the rifle tree (138), and the holster tree (139). The nodes of the pistol tree (137) include examples of a pistol, the nodes of the rifle tree (138) include examples of a rifle, and the nodes of the holster tree (139) include examples of a holster. When the weapon match classifier (135) identifies a match between the weapon image (134) and an example from one of the trees (137), (138), and (139), the weapon match classifier (135) generates an event that is sent to the event queue (141). In one embodiment, the event includes a camera identifier (identifying the camera that generated the video frame (131)), a time stamp (identifying the time and date the video frame (131) was captured), and the video frame (131).

The event queue (141) receives events from the weapon match classifier (135). In one embodiment, the event queue (141) may receive events from multiple weapon detection workers operating as part of the weapon detection application (104) to process multiple frames from multiple videos of multiple cameras connected to the system (100) (of FIG. 1F).

The weapon alert generator (142) is a program that monitors the event queue (141). When the weapon alert generator (142) identifies a new event in the event queue (141) from the weapon match classifier (135), the weapon alert generator (142) generates an alert, also referred to as a weapon alert. The weapon alert may identify the type of weapon, the camera that captured the image of the weapon, and the video frame (131) that includes the weapon image (134).

Turning to FIG. 1B, the camera pipeline application (103) is a set of programs that may operate as part of the server application (102) (of FIG. 1F). The camera pipeline application (103) receives and processes video generated by the camera systems (113) (of FIG. 1F) connected to the system (100) (of FIG. 1F). The camera pipeline application (103) may use several programs to receive the camera video streams (144) as input and output video to the player (156) after queueing video from the camera video streams (144) based on events detected within the camera video streams (144).

The frame selector (146) is a program that operates as part of the camera pipeline application (103). The frame selector (146) selects frames from the camera video streams (144). The frame selector (146) may select frames on a periodic basis (e.g., one frame every 2 seconds). The frame selector (146) passes the frames from the camera video streams (144) to the face detection queue (147), the weapon detection queue (149), and the frame queue (153). The frame selector (146) may use a different selection frequency (0.2 hertz, 0.5 hertz, 2 hertz, etc.) for each of the queues (147), (149), and (153).

The face detection queue (147) is a program operating as part of the camera pipeline application (103). The face detection queue (147) receives video frames from the frame selector (146). After receiving the frames, the face detection workers (151), including the face detection worker (148) (described in FIG. 1C), may pull the frame from the face detection queue (147) to detect faces in the frames. Output from the face detection workers (151) are sent to the event queue (141).

The weapon detection queue (149) is a program operating as part of the camera pipeline application (103). The weapon detection queue (149) receives video frames from the frame selector (146). After the weapon detection queue (149) receives the frames, the weapon detection workers (150), including the weapon detection worker (130) (described in FIG. 1A), may pull frames from the weapon detection queue (149) to detect weapons in the frames. Output from the weapon detection workers (150) are sent to the event queue (141).

The event queue (141) is a program operating as part of the camera pipeline application (103) the event queue (141) receives events from the face detection of workers (151) and the weapon detection workers (150). The events identify the frames from the camera video streams (144) that include particular events. For example, an event may identify that an unknown visitor is detected in a video frame and may identify that a weapon is detected in a video frame.

The frame queue (153) is a program that operates under the camera pipeline application (103). The frame queue (153) receives the frames from the frame selector (146) that are input to the sorter (154).

The sorter (154) is a program that operates under the camera pipeline application (103). The sorter (154) orders the video from the camera video streams (144) for playback on the player (156). In one embodiment, the sorter (154) processes the events in the event queue (141) to prioritize the frames and videos. For example, when a weapon event is detected, the frame and video associated with that weapon event may be given a higher priority than the other videos. Additionally, when an unknown visitor is detected, the frame and video including the unknown visitor may be given a higher priority than the other videos. The sorter (154) pushes video from the camera video streams (144) to the player queue (155).

The player queue (155) is a program that operates under the camera pipeline application (103). The player queue (155) receives sequences of video from the sorter (154) that may be prioritized based on the events in the event queue (141) detected by the face detection workers (151) and the weapon detection workers (150). The videos identified by the player queue (155) or presented to the player (156). In one embodiment, the video may be stored as part of the image data (121) (of FIG. 1F) in the repository (120) (of FIG. 1F) and the player queue (155) may use identifiers that reference the image data (121).

The player (156) is a program that operates under the camera pipeline application (103). The player (156) presents video to the user device (117) (of FIG. 1F). For example, player (156) may stream the video as ordered in the player queue (155) to the user application (118) (of FIG. 1F) of the user device (117) (of FIG. 1F).

Turning to FIG. 1C, the face detection application (105) is a set of programs that may operate as part of the server application (102) (of FIG. 1F). The face detection application (105) detects faces in the video frame (131) and queues the face images (164) into the face identification queue (165) for the face identification workers (166). The face detection application (105) includes the face detection worker (148).

The face detection worker (148) may be one of multiple face detection workers operating as part of the face detection application (105). The face detection worker (148) detects faces from the frame sections (168) and extracts the face images (164), including the face image (163), from the frame sections (168). In one embodiment, the face images (164) are extracted directly from the video frame (131). The face detection worker (148) includes the frame section generator (159).

The frame section generator (159) extracts the frame sections (168), including the frame section (160), from the video frame (131). Increasing the number of sections may increase the accuracy of face detection. In one embodiment, the video frame (131) may be split into 16 non-overlapping frames. In one embodiment, the different frame sections (168) may include overlapping subsections of the video frame (131) to improve the accuracy of detecting faces along frame section boundaries.

The face detection model (161) is a program of the face detection worker (148). The face detection model (161) receives the frame sections (168) and detects the presence of faces within each of the frame sections (168). Each of the frame sections (168) may include multiple faces that are detected with the face detection model (161). In one embodiment, the face detection model (161) is a machine learning model that includes a convolutional neural network. The output of the face detection model (161) may identify the locations of faces within the frame section (160) and is sent to the face extractor (162).

The face extractor (162) is a program of the face detection worker (148). The face extractor (162) receives the locations of faces within the video frame (131). The locations of the faces may be relative to either the frame section (160) or directly to the video frame (131). The face extractor (162) extracts the face images (164) from the video frame (131) using the face locations received from the face detection model (161). In one embodiment, the face extractor (162) normalizes the size and resolution of the face images (164). The normalization may include extrapolating the values of pixels from a lower resolution to a higher resolution or vice versa. The face images (164) are sent to the face identification queue (165).

The face identification queue (165) receives the face images (164) from the face detection worker (148). In one embodiment, the face identification queue (165) may receive face images from multiple face detection workers. The face images (164) in the face identification queue (165) are operated on by the face identification workers (166). The face identification workers (166) include the face identification worker (167), which is further described in FIG. 1D.

Turning to FIG. 1D, the face identification application (106) is a set of programs that may operate as part of the server application (102) (of FIG. 1F). The face identification application (106) identifies faces from face images and queues events to the event queue response to identifying the faces.

The face identification worker (167) is a program of the face identification application (106). The face identification worker (167) receives the face image (163) and matches the face to embeddings from embedding trees, which include the student tree (176), the staff tree (177), the parent tree (178), the visitor tree (179), and the unknown visitor tree (180). The face identification worker (167) includes the embedding model (172).

The embedding model (172) is a program of the face identification worker (167). The embedding model (172) generates the face embedding vector (173) from the face image (163) using a machine learning model. In one embodiment, the embedding model (172) uses a trunked convolutional neural network to generate the face embedding vector (173) from the face image (163). The face embedding vector (173) is input to the face match classifier (174).

The face match classifier (174) is a program of the face identification application (106). The face match classifier (174) compares the face embedding vector (173) to multiple embedding trees. Different trees may correspond to different categories of people. In one embodiment, the different trees correspond to the different types of people that may be present at an educational facility. For example, the embedding trees may include the student tree (176), the staff tree (177) the parent tree (178), the visitor tree (179), and the unknown visitor tree (180). The nodes of the student tree (176) include embeddings that correspond to students that attend the educational facility. The nodes of the staff tree (177) correspond to the staff members that attended the educational facility. The nodes of the parent tree (178) correspond to the parents of the students that attended the educational facility. The nodes of the visitor tree (179) correspond to the visitors to the educational facility. The nodes of the unknown visitor tree (180) correspond to unknown visitors of the educational facility. The visitor tree (179) and the unknown visitor tree (180) may be continuously updated during a day. The student tree (176), the staff tree (177), and the parent tree (178) may be periodically updated (each semester, each year, etc.).

Different types of trees may be used. For example, if the system (100) were installed at a business, the embedding trees may include an executive tree, an employee tree, a contractor tree, a visitor tree, and an unknown visitor tree.

The face match classifier (174) compares the face embedding vector (173) to the embeddings of the trees (176), (177), (178), (179), and (180). When a match is found, an event is queued to the event queue (141). When a match is not found, the face embedding vector (173) may be sent to the unknown queue (182).

The unknown queue (182) is a program of the face identification application (106). The unknown queue (182) receives face embedding vectors that did not match any of the trees used by the face match classifier (174) of the face identification worker (167). The unknown queue (182) is serviced by the unknown worker (185).

The unknown worker (185) is a program of the face identification application (106). The unknown worker (185) manages the trees used by the system (100) in response to items in the unknown queue (182). The unknown worker (185) includes the unknown visitor manager (187).

The unknown visitor manager (187) receives the face embedded vector (186) from the unknown queue (182). The unknown visitor manager (187) may supplement the unknown visitor tree (180) with the face embedding vector (186). The unknown visitor manager (187) sends events to the event queue (141) based on handling the face embedding vector (186). For example, the face embedding vector (186) may belong to a person that has not been identified by the system (100) (of FIG. 1F) and the unknown visitor tree (180) may be updated to include the face embedding vector (186). The unknown visitor manager (187) may then transmit an event to the event queue (141) to indicate that a new person has been detected by the system (100).

The event queue (141) is a program of the face identification application (106). The event queue (141) receives events from the face match classifier (174) and from the unknown visitor manager (187).

The face alert generator (188) generates alerts based on the entries in the event queue (141). For example, when a person whose face is not recognized by the system is detected, the face alert generator (188) may create an unknown visitor alert that identifies the video frame (131) (of FIG. 1C) and the location of the unknown visitor. Additionally, certain individuals may be flagged for alerts. As an example, a visitor may be allowed to stay on campus for a fixed amount of time. If the visitor is detected (and matched with an entry of the visitor tree (179)) outside the allowed time period, an alert may be generated. Additionally, an alert may be generated the first time a person is detected by the system (100) (of FIG. 1F) during a day.

Turning to FIG. 1E, the training application (108) trains the machine learning model (192) used by the system (100) (shown in FIG. 1F). Each of the machine learning models that form the machine learning model (192) may be trained independently or collectively with each other. The machine learning model (192) may include multiple machine learning models, including the weapon detection model (132) (of FIG. 1A), the weapon match classifier (135) (of FIG. 1A), the face detection model (161) (of FIG. 1C), the embedding model (172) (of FIG. 1D), and the face match classifier (174) (of FIG. 1D). As an example, the embedded model (172) and the face match classifier (174) may be trained together to improve the accuracy of each model when used together.

The machine learning model (192) takes the training input (190), generates the training output (194) from the training input (190), uses the update function (196) to compare the training output (194) to the expected output (198), and updates the machine learning model (192). The updates may be in accordance with, or proportional to, the errors observed between the training output (194) and the expected output (198) by the update function (196).

In one embodiment, the machine learning model (192) includes a neural network model that uses forward propagation to generate the training output (194) from the training input (190). The update function (196) uses backpropagation to update the weights of the neural network of the machine learning model (192) based on the error between the training output (194) and the expected output (198). Different models with different algorithms for updating the parameters of the models may be used.

Turning to FIG. 1F, the system (100) trains and uses machine learning models for weapon and face detection. The system (100) includes the camera systems (113), the server (101), the repository (120), the developer device (115), and the user device (117).

The camera systems (113), including the camera system (112), captures and transmits the camera video streams (144) (of FIG. 1B) to the server (101). The camera systems (113) may include an embodiment of the computing system (500) and the nodes (522) and (524) of FIG. 5A and FIG. 5B. In one embodiment, the camera system (112) is implemented with an embedded computing system that include processor and memory to execute in store programs on the camera system (112) to transmit the video camera stream (145) (of FIG. 1B) to the server (101).

The server (101) is an embodiment of the computing system (500) and the nodes (522) and (524) of FIG. 5A and FIG. 5B. The server (101) may be one of a set of virtual machines hosted by a cloud services provider to deploy the training application (108) and the server application (102). The server (101) includes the processor (110) (which may include multiple processors) and the memory (111) (which may include multiple memories) that execute and store the applications and programs running on the server (101). Each of the programs running on the server (101) may execute inside one or more containers hosted by the server (101).

The server application (102) is a program on the server (101). The server application (102) includes multiple programs used by the system (100) to interact with the user device (117) and present data to a user of the user device (117). The server application (102) may include the camera pipeline application (103) (further described in FIG. 1B), the weapon detection application (104) (further described in FIG. 1A), the face detection application (105) (further described in FIG. 1C), and the face identification application (106) (further described in FIG. 1D). In one embodiment, the server application (102) may receive video the camera systems (113) and identify weapons and faces from the video. Using machine learning models. In one embodiment, the server application (102) may transmit a video stream to the user application (118). The video stream may be assembled using the camera pipeline application (103).

The training application (108) (further described in FIG. 1E) is a program on the server (101). The training application (108) trains the machine learning models using the machine learning model data (122) and the training data (124). The training application (108) may be operated or controlled by the developer device (115) with the developer application (116).

The repository (120) is a computing system that may include multiple computing devices in accordance with the computing system (500) and the nodes (522) and (524) described below in FIGS. 5A and 5B. The repository (120) may be hosted by a cloud services provider. The cloud services provider may provide hosting, virtualization, and data storage services as well as other cloud services and to operate and control the data, programs, and applications that store and retrieve data from the repository (120). The data in the repository (120) may include the image data (121), the machine learning model data (122), the tree data (123), and the training data (124).

The image data (121) is data received by the system (100) from the camera systems (113). The image data (121) in the repository (120) may include raw data and may include processed data. For example, the raw data may include data (images, video, etc.) transmitted directly from the camera systems (113) that is received by the server (101) and stored in the repository (120). The processed data may include data from the camera systems (113) that have been processed, which may include locations of weapons, persons, faces, etc.

The machine learning model data (122) may include the code and data that form the machine learning models used by the system (100). For example, the weights of the models may be part of the machine learning model data (122), including the weights of a neural network. The machine learning models of the system include mathematical models, which may include linear models, exponential models, polynomial models, statistical models, neural network models, etc.

The tree data (123) includes multiple vantage point trees. The vantage point trees are used to classify images and embeddings, including face images, face embeddings, and weapon images.

The training data (124) is the data used to train the machine learning models of the system (100). The training data (124) may include pairs of training inputs and expected outputs and may also include the intermediate data generated to train and update the machine learning models of the system (100).

The data in the repository (120) may also include the web page (125) that is part of a website hosted by the system (100). The users and the developers may interact with the website using the user device (117) and the developer device (115) to access the server application (102) and the training application (108).

The developer device (115) is an embodiment of the computing system (500) and the nodes (522) and (524) of FIG. 5A and FIG. 5B. The developer device (115) may be a desktop personal computer (PC), a laptop, a smartphone, etc. The developer device (115) includes the developer application (116) for accessing the training application (108). The developer application (116) may include a graphical user interface for interacting with the training application (108) to control training and updating the machine learning models of the system (100).

The user device (117) is an embodiment of the computing system (500) and the nodes (522) and (524) of FIG. 5A and FIG. 5B. The user device (117) may be a desktop personal computer (PC), a laptop, a smartphone, etc. The user device (117) may be used to access the web page (125) of the website hosted by the system (100). The user device (117) includes the user application (118) for accessing the server application (102). The user application (118) may include multiple interfaces (graphical user interfaces, application program interfaces (APIs), etc.) for interacting with the server application (102). A user may operate the user application (118) to perform tasks with the server application (102) to interact with the system (100). The results may be presented by being displayed by the user device (117) in the user application (118). For example, the user application (118) may display video streams from the camera systems (113) that have been processed by the system (100).

The developer application (116) and the user application (118) may be web browsers or native applications that access the server application (102) and the training application (108) using resources hosted by the server (101). The developer application (116) and the user application (118) may additionally be web services that communicate with the server application (102) and the training application (108) using representational state transfer application programming interfaces (RESTful APIs). Although FIG. 1E shows a client server architecture, one or more parts of the training application (108) and the server application (102) may be local applications on the developer device (115) and the user device (117) without departing from the scope of the disclosure.

FIGS. 2, 3A, 3B, 3C, and 3D show flowcharts of processes in accordance with the disclosure. The process (200) of FIG. 2 uses machine learning models to perform weapon detection and tracking. The process (300) of FIG. 3A detects weapons from images using machine learning models. The process (320) of FIG. 3B processes video streams. The process (350) of FIG. 3C detects faces using machine learning models. The process (370) of FIG. 3D identifies faces using machine learning models. The embodiments of FIGS. 2, 3A, 3B, 3C, and 3D may be combined and may include or be included within the features and embodiments described in the other figures of the application. The features of FIGS. 2, 3A, 3B, 3C, and 3D are, individually and as an ordered combination, improvements to the technology of computing systems and machine learning systems. While the various steps in the flowcharts are presented and described sequentially, one of ordinary skill will appreciate that at least some of the steps may be executed in different orders, may be combined or omitted, and at least some of the steps may be executed in parallel. Furthermore, the steps may be performed actively or passively. For example, some steps may be performed using polling or be interrupt driven. By way of an example, determination steps may not have a processor process an instruction unless an interrupt is received to signify that condition exists. As another example, determinations may be performed by performing a test, such as checking a data value to test whether the value is consistent with the tested condition.

Turning to FIG. 2, the process (200) uses machine learning models to perform weapon detection and tracking. At Block 202, a frame of a video is received from a camera. The camera may be part of a camera system that is part of multiple camera systems and the frame may be one of multiple frames from different camera systems.

At Block 204, a weapon is detected in the frame using a weapon detection model. In one embodiment, a convolutional neural network is used to detect the weapon in the frame.

At Block 206, the weapon from the frame is classified using a weapon match classifier. The weapon match classifier may compare a weapon image from the frame to the nodes of weapon trees. Each weapon tree may correspond to a particular type of weapon or weapon accessory.

At Block 208, a weapon alert is generated in response to classifying the weapon. The weapon alert may identify the camera system that captured the weapon, the time and date that the weapon was captured with the camera system, and a physical location of the weapon, which may correspond and be with respect to the physical location of the camera system.

At Block 210, the video is presented in response to the weapon alert. For example, the system may identify the weapon from a weapon image extracted from the frame and identify a video stream from which the frame was selected. The system may then play the video stream by transmitting the video stream to a user device that displays the video stream.

Turning to FIG. 3A, the process (300) may be performed by a weapon detection application running on a server. At Block 302, a message is consumed that contains a frame. In one embodiment, the message is received by a message queue, operating as a weapon detection queue, and then serviced by a weapon detection worker program.

At Block 303, data is decoded. As an example, the data from the frame is decoded into an array for input to a machine learning model.

At Block 304, a weapon is detected within the image. As an example, the array may be processed by executing a “weapon_detection” function that inputs the array to a weapon detection model. The “weapon_detection” function returns bounding box coordinates for each weapon detected in the image.

At Block 305, if at least one weapon was not detected, then the process (300) ends. Otherwise the process continues to Block 307.

At Block 307, weapon images are extracted based on the detection of weapons in the image. As an example, a “crop_image” function may be invoked for each weapon detection to cut the original image down to the part of the image within the bounding box for the detected weapon and generate a weapon image from the contents of the bounding box. Each weapon image is searched in multiple vantage point (VP) trees (e.g., a pistol tree, a rifle tree, a holster, etc.) to identify a node of the tree that is a closest match too an individual weapon image. A distance function is used that identifies a distance between the weapon image and the example images used to generate the tree.

At Block 308, the cropped image is searched against multiple vantage point trees to classify the weapon of the cropped image. Each tree corresponds to a classification. For example, a pistol tree corresponds to pistols. When the cropped image matches with the pistol tree, the weapon in the cropped image is classified as a pistol.

At Block 310, a weapon image is searched against a pistol tree. If the weapon image is matched to a node of the pistol tree, then the process (300) proceeds to Block 315. Otherwise the process (300) proceeds to Block 311.

At Block 311, a weapon image is searched against a rifle tree. If the weapon image is matched to a node of the rifle tree, then the process (300) proceeds to Block 315. Otherwise the process (300) proceeds to Block 312.

At Block 312, a weapon image is searched against a holster tree. If the weapon image is matched to a node of the holster tree, then the process (300) proceeds to Block 313. Otherwise the process (300) ends.

At Block 313, a log entry is created. The log entry identifies that a holster has been identified in a weapon image. In one embodiment, the word “holster” may be printed to the log entry.

At Block 315, a weapon has been detected and information about the weapon is queued to an event queue. As an example, a camera identifier that identifies the camera system that captured the original image, a timestamp that identifies the date and time that the image was captured, and the weapon image cropped from the original image, are encapsulated into a message sent to an event queue.

At Block 317, a weapon alert is sent. The weapon alert indicates that a weapon has been detected by this system.

Turning to FIG. 3B, the process (320) may be performed by a camera pipeline application running on a server. At Block 322, video frames are received. The video frames are received as part of multiple video streams from multiple camera systems connected to this system.

At Block 323, video frames are selected. The video frames may be selected from the video streams at a constant speed. For example, each stream may have a video frame pulled every 2 seconds (a frequency of 0.5 hertz). Different video streams from different video cameras may have video frames selected at different frequencies.

At Block 325, frames are queued to a face detection queue. The face detection queue may be a message queue that is serviced by a face detection worker.

At Block 326, face detection is executed. The face detection is performed by a face detection worker using a machine learning model.

At Block 327, frames are queued in a weapon detection queue. The weapon detection queue may be a message queue that is serviced by a weapon detection worker.

At Block 328, weapon detection is executed. The weapon detection is performed by a weapon detection worker using a machine learning model.

At Block 330, detections are queued. The detections of faces and weapons in the video frames from the video streams are queued into an event queue. Each face detection and weapon detection may identify the frame, camera system, time and date, and location that correspond to the event that was detected. Each face detection and weapon detection may be encapsulated into a message that is sent to the event queue, From which alerts may be generated.

At Block 331, frames are queued into a frame queue. The frame queue stores the video frames that are selected from the video streams.

At Block 332, frames are ordered and streams are created. The frames in the frame queue may be ordered based on the face detections and weapon detections in the event queue. In one embodiment, frames (and the corresponding video streams) with a face detection event may be given priority over frames that do not include detection events. Frames with weapon detection events may be given priority over frames that do not include a weapon detection events. After prioritizing the video based on the frames and detection events, a video stream may be created that includes the video stream with the highest priority determined from ordering the frames based on the detected events.

At Block 334, streams are queued. Multiple streams may be queued for multiple user devices to display the video streams perceived by the system.

At Block 336, streams are served. The system serves and transmits the video streams to the user devices that display the streams.

Turning to FIG. 3C, the process (350) may be performed by a face detection application running on a server. At Block 352, a message is consumed that contains a frame. In one embodiment, the message is received by a message queue, operating as a face detection queue, and then serviced by a face detection worker program.

At Block 354, data is transformed. The video frame may be transformed into a multidimensional multichannel array with horizontal and vertical dimensions and channels for red, green, and blue.

At Block 356, sections are calculated for the received frame. The number of sections into which a frame is to be subdivided is identified and the lengths, widths, and offsets for each of the sections is determined with reference to the original frame. Each section may then be extracted from the original frame using the length, width, and offsets for the particular frame.

At Block 358, detections are obtained for each section. In one embodiment, each section is extracted from the original frame and input to a machine learning model. The machine learning model may output the presence and location of a face within the section of the frame. The output may include the coordinates for a bounding box that surrounds the image of the face within the section of the frame.

At Block 360, if at least one face is not detected, then the process (350) ends. Otherwise the process (350) continues to Block 362.

At Block 362, faces detected from the original frame are cropped. In one embodiment, the bounding box output from the machine learning model that identifies the location with face is used to crop out and remove the portions of the image that do not correspond to a face detected in the image. Multiple faces may be detected in a single section of an image and multiple face images may be cropped out from the original image.

At Block 364, face images are queued. The face images, which were cropped out from the original image, may be encapsulated into message better passed into a face identification queue.

At Block 366 face identification is executed. In one embodiment, a face identification worker services the messages in the face identification queue to identify the people in the face images detected from the original video frame. In one embodiment, the face identification worker uses a machine learning model to identify the type of person whose face is captured in the face image. As an example, when the system is installed in an educational facility, the types of persons may include students, staff, parents, visitors, and unknown visitors.

Turning to FIG. 3D, the process (370) may be performed by a face identification application running on a server. At Block 372, a message with a face image is consumed by a face identification worker. The message may be from the face identification queue that was filled by the face detection worker.

At Block 373, a face image is transformed to a face embedding vector. The transformation may be performed by a machine learning model that takes in the multi dimensional multi channel face image input and outputs a vector. In one embodiment, the face embedding vector may have hundreds to thousands of elements.

At Block 374, a search is performed with a classifier. The face embedding vector may be searched by the face identification worker against multiple vantage point trees to identify the type of person captured in the face image. In one embodiment, the vantage point trees include a student tree, a staff tree, a visitor tree, and an unknown visitor tree. Each tree may correspond to a different type of person. Additional trees that may correspond to different types of people may be used.

At Block 376, if the face embedding vector matches to the student tree (indicating that the person is a student), then the process (370) proceeds to Block 380. Otherwise the process continues to Block 377.

At Block 377, if the face embedding vector matches to the staff tree (indicating that the person is a staff member), then the process (370) proceeds to Block 380. Otherwise the process continues to Block 378.

At Block 378, if the face embedding vector matches to the visitor tree (indicating that the person is a visitor), then the process (370) proceeds to Block 380. Otherwise the process continues to Block 379.

At Block 379, if the face embedding vector matches to the unknown visitor tree (indicating that the person is an unknown visitor, e.g., someone who has not signed into the facility), then the process (370) proceeds to Block 380. Otherwise the process continues to Block 381.

At Block 380, a message is queued to an event queue. The message indicates the type of person identify from the face image.

At Block 381, a message is queued to an unknown queue. The message may include the face embedding vector. The unknown queue is serviced by an unknown worker, which is a program that manage is the vantage point trees of the system based on the messages passed to the unknown queue.

At Block 382, a counter is initialized and a tree is cleared. For example, the unknown worker, which handles the unknown queue, may be reset every day along with the unknown visitor tree.

At Block 383, data is transformed. The face embedding vector from the message to the unknown queue. For example, the message to the unknown queue may have stored the face embedding vector as string values, which are converted to floating point values.

At Block 385, a tree embedding threshold is checked. The tree embedding threshold identifies a minimum number of embeddings for the unknown visitor tree. The embeddings that form the unknown visitor tree are face embedding vectors that had the same structure as the face embedding vector from the unknown queue. When the number of embeddings in the unknown visitor tree is less than the tree embedding threshold, the process (370) proceeds to Block 387. Otherwise the process (370) proceeds to Block 390.

At Block 387, an embedding is added to the tree. For example, the face embedding vector from the unknown queue may be added to the unknown visitor tree.

At Block 388, an embedding and an identifier are saved. The embedding is the face embedding vector from the unknown queue and the identifier is the identifier for the person captured in the face image from which the face embedding vector was generated.

At Block 390, an embedding is added to the tree and an identifier is assigned.

The embedding is the face embedding vector from the unknown queue and the identifier is a new identifier.

At Block 391, a minimum match threshold is checked. The minimum match threshold identifies the minimum degree of closeness needed for the face embedding vector from the unknown queue to match an embedding in the unknown visitor tree. The degree of closeness may be determined by a distance function which may be the distance function used by the vantage point tree algorithm to generate the vantage point trees used by the system. The distance function may identify the number of embeddings in the tree that match to the face embedding vector. When the minimum match threshold is met, the process (370) proceeds to Block 392. Otherwise the process (370) proceeds to Block 388.

At Block 392, matched embeddings are joined. For example, the face embedding vector from the unknown queue may match with multiple embeddings already present in the vantage point tree. In this case, the embeddings from the tree are joined with the face embedding vector and may be treated as a group.

At Block 393, an identifier is assigned. In one embodiment, the identifier is a subsequent identifier that is different from the new identifier assigned at Block 390. For example, the subsequent identifier may be assigned to the face embedding vector from the unknown queue and to the other embeddings that matched to the face embedding vector that were joined at Block 392.

At Block 394, an embedding is saved to a database. In one embodiment, the face embedding vector from the unknown queue is saved to a database of embeddings.

At Block 395, a message is queued to an event queue. The message indicates that a new unknown visitor has been added to the system.

At Block 396, an unknown alert is sent. The unknown alert indicates that a new unknown visitor has been identified and may identify the camera system, the time and date, and the video frame in which the new unknown visitor was initially identified.

FIGS. 4A and 4B show examples of systems that detect and track weapons. FIG. 4A shows an example of the system capturing images in accordance with the disclosure. FIG. 4B shows an example of the system processing images in accordance with the disclosure. The embodiments shown in FIGS. 4A and 4B may be combined and may include or be included within the features and embodiments described in the other figures of the application. The features and elements of FIGS. 4A and 4B are, individually and as a combination, improvements to the technology of computing systems and machine learning systems. The various features, elements, widgets, components, and interfaces shown in FIGS. 4A and 4B may be omitted, repeated, combined, and/or altered as shown. Accordingly, the scope of the present disclosure should not be considered limited to the specific arrangements shown in FIGS. 4A and 4B.

Turning to FIG. 3A, the camera system (402) is installed in the environmental space (400). In one embodiment, the camera system (402) may be installed on a wall within the environmental space (400). The environmental space (400) may be an indoor area, a hallway, an outdoor area, etc. The camera system (402) is connected to the server (408). The server (408) receives a video stream from the camera system (402). The server (408) processes the frames and streams from the camera system (402) to identify the person (412) and the weapon (418).

Turning to FIG. 4B, the user interface (450) shows the displayed image (452). The user interface (450) may be part of a web browser or native application on a user device (e.g., a smartphone or personal computer).

The displayed image (452) may be a modified version of a video frame from the camera system (402) (of FIG. 4A). The displayed image (452) is modified to include the box (460) around the face of the person (412), the alert (470), the box (462) around the weapon (418), and the alert (472).

The box (460) is a bounding box identified by a machine learning model at the location of a face of the person (412). The portion of the displayed image (452) within the box (460) may form the face image that is analyzed by the server (408) (of FIG. 4A) to classify the person (412) as an unknown visitor.

The alert (470) is an unknown visitor alert. The alert (470) identifies that the person (412) is an unknown visitor to the facility in which the system was installed.

The box (462) is a bounding box identified by machine learning model at the location of the weapon (418). The portion of the displayed image (452) within the box (462) may form the weapon image that is analyzed by the server (408) (of FIG. 4A) to detect the presence of a weapon.

The alert (472) is a weapon alert. The alert (472) identifies that the weapon (418) has been identified by the system.

Embodiments of the disclosure may be implemented on a computing system specifically designed to achieve an improved technological result. When implemented in a computing system, the features and elements of the disclosure provide a significant technological advancement over computing systems that do not implement the features and elements of the disclosure. Any combination of mobile, desktop, server, router, switch, embedded device, or other types of hardware may be improved by including the features and elements described in the disclosure. For example, as shown in FIG. 5A, the computing system (500) may include one or more computer processors (502), non-persistent storage (504) (e.g., volatile memory, such as random access memory (RAM), cache memory), persistent storage (506) (e.g., a hard disk, an optical drive such as a compact disk (CD) drive or digital versatile disk (DVD) drive, a flash memory, etc.), a communication interface (512) (e.g., Bluetooth interface, infrared interface, network interface, optical interface, etc.), and numerous other elements and functionalities that implement the features and elements of the disclosure.

The computer processor(s) (502) may be an integrated circuit for processing instructions. For example, the computer processor(s) may be one or more cores or micro-cores of a processor. The computing system (500) may also include one or more input devices (510), such as a touchscreen, keyboard, mouse, microphone, touchpad, electronic pen, or any other type of input device.

The communication interface (512) may include an integrated circuit for connecting the computing system (500) to a network (not shown) (e.g., a local area network (LAN), a wide area network (WAN) such as the Internet, mobile network, or any other type of network) and/or to another device, such as another computing device.

Further, the computing system (500) may include one or more output devices (508), such as a screen (e.g., a liquid crystal display (LCD), a plasma display, touchscreen, cathode ray tube (CRT) monitor, projector, or other display device), a printer, external storage, or any other output device. One or more of the output devices may be the same or different from the input device(s). The input and output device(s) may be locally or remotely connected to the computer processor(s) (502), non-persistent storage (504), and persistent storage (506). Many different types of computing systems exist, and the aforementioned input and output device(s) may take other forms.

Software instructions in the form of computer readable program code to perform embodiments of the invention may be stored, in whole or in part, temporarily or permanently, on a non-transitory computer readable medium such as a CD, DVD, storage device, a diskette, a tape, flash memory, physical memory, or any other computer readable storage medium. Specifically, the software instructions may correspond to computer readable program code that, when executed by a processor(s), is configured to perform one or more embodiments of the invention.

The computing system (500) in FIG. 5A may be connected to or be a part of a network. For example, as shown in FIG. 5B, the network (520) may include multiple nodes (e.g., node X (522), node Y (524)). Each node may correspond to a computing system, such as the computing system shown in FIG. 5A, or a group of nodes combined may correspond to the computing system shown in FIG. 5A. By way of an example, embodiments of the invention may be implemented on a node of a distributed system that is connected to other nodes. By way of another example, embodiments of the invention may be implemented on a distributed computing system having multiple nodes, where each portion of the invention may be located on a different node within the distributed computing system. Further, one or more elements of the aforementioned computing system (500) may be located at a remote location and connected to the other elements over a network.

Although not shown in FIG. 5B, the node may correspond to a blade in a server chassis that is connected to other nodes via a backplane. By way of another example, the node may correspond to a server in a data center. By way of another example, the node may correspond to a computer processor or micro-core of a computer processor with shared memory and/or resources.

The nodes (e.g., node X (522), node Y (524)) in the network (520) may be configured to provide services for a client device (526). For example, the nodes may be part of a cloud computing system. The nodes may include functionality to receive requests from the client device (526) and transmit responses to the client device (526). The client device (526) may be a computing system, such as the computing system shown in FIG. 5A. Further, the client device (526) may include and/or perform all or a portion of one or more embodiments of the invention.

The computing system or group of computing systems described in FIG. 5A and FIG. 5B may include functionality to perform a variety of operations disclosed herein. For example, the computing system(s) may perform communication between processes on the same or different system. A variety of mechanisms, employing some form of active or passive communication, may facilitate the exchange of data between processes on the same device. Examples representative of these inter-process communications include, but are not limited to, the implementation of a file, a signal, a socket, a message queue, a pipeline, a semaphore, shared memory, message passing, and a memory-mapped file. Further details pertaining to a couple of these non-limiting examples are provided below.

Based on the client-server networking model, sockets may serve as interfaces or communication channel end-points enabling bidirectional data transfer between processes on the same device. Foremost, following the client-server networking model, a server process (e.g., a process that provides data) may create a first socket object. Next, the server process binds the first socket object, thereby associating the first socket object with a unique name and/or address. After creating and binding the first socket object, the server process then waits and listens for incoming connection requests from one or more client processes (e.g., processes that seek data). At this point, when a client process wishes to obtain data from a server process, the client process starts by creating a second socket object. The client process then proceeds to generate a connection request that includes at least the second socket object and the unique name and/or address associated with the first socket object. The client process then transmits the connection request to the server process. Depending on availability, the server process may accept the connection request, establishing a communication channel with the client process, or the server process, busy in handling other operations, may queue the connection request in a buffer until server process is ready. An established connection informs the client process that communications may commence. In response, the client process may generate a data request specifying the data that the client process wishes to obtain. The data request is subsequently transmitted to the server process. Upon receiving the data request, the server process analyzes the request and gathers the requested data. Finally, the server process then generates a reply including at least the requested data and transmits the reply to the client process. The data may be transferred, more commonly, as datagrams or a stream of characters (e.g., bytes).

Shared memory refers to the allocation of virtual memory space in order to substantiate a mechanism for which data may be communicated and/or accessed by multiple processes. In implementing shared memory, an initializing process first creates a shareable segment in persistent or non-persistent storage. Post creation, the initializing process then mounts the shareable segment, subsequently mapping the shareable segment into the address space associated with the initializing process. Following the mounting, the initializing process proceeds to identify and grant access permission to one or more authorized processes that may also write and read data to and from the shareable segment. Changes made to the data in the shareable segment by one process may immediately affect other processes, which are also linked to the shareable segment. Further, when one of the authorized processes accesses the shareable segment, the shareable segment maps to the address space of that authorized process. Often, only one authorized process may mount the shareable segment, other than the initializing process, at any given time.

Other techniques may be used to share data, such as the various data described in the present application, between processes without departing from the scope of the invention. The processes may be part of the same or different application and may execute on the same or different computing system.

Rather than or in addition to sharing data between processes, the computing system performing one or more embodiments of the invention may include functionality to receive data from a user. For example, in one or more embodiments, a user may submit data via a graphical user interface (GUI) on the user device. Data may be submitted via the graphical user interface by a user selecting one or more graphical user interface widgets or inserting text and other data into graphical user interface widgets using a touchpad, a keyboard, a mouse, or any other input device. In response to selecting a particular item, information regarding the particular item may be obtained from persistent or non-persistent storage by the computer processor. Upon selection of the item by the user, the contents of the obtained data regarding the particular item may be displayed on the user device in response to the user's selection.

By way of another example, a request to obtain data regarding the particular item may be sent to a server operatively connected to the user device through a network. For example, the user may select a uniform resource locator (URL) link within a web client of the user device, thereby initiating a Hypertext Transfer Protocol (HTTP) or other protocol request being sent to the network host associated with the URL. In response to the request, the server may extract the data regarding the particular selected item and send the data to the device that initiated the request. Once the user device has received the data regarding the particular item, the contents of the received data regarding the particular item may be displayed on the user device in response to the user's selection. Further to the above example, the data received from the server after selecting the URL link may provide a web page in Hyper Text Markup Language (HTML) that may be rendered by the web client and displayed on the user device.

Once data is obtained, such as by using techniques described above or from storage, the computing system, in performing one or more embodiments of the invention, may extract one or more data items from the obtained data. For example, the extraction may be performed as follows by the computing system in FIG. 5A. First, the organizing pattern (e.g., grammar, schema, layout) of the data is determined, which may be based on one or more of the following: position (e.g., bit or column position, Nth token in a data stream, etc.), attribute (where the attribute is associated with one or more values), or a hierarchical/tree structure (consisting of layers of nodes at different levels of detail-such as in nested packet headers or nested document sections). Then, the raw, unprocessed stream of data symbols is parsed, in the context of the organizing pattern, into a stream (or layered structure) of tokens (where each token may have an associated token “type”).

Next, extraction criteria are used to extract one or more data items from the token stream or structure, where the extraction criteria are processed according to the organizing pattern to extract one or more tokens (or nodes from a layered structure). For position-based data, the token(s) at the position(s) identified by the extraction criteria are extracted. For attribute/value-based data, the token(s) and/or node(s) associated with the attribute(s) satisfying the extraction criteria are extracted. For hierarchical/layered data, the token(s) associated with the node(s) matching the extraction criteria are extracted. The extraction criteria may be as simple as an identifier string or may be a query presented to a structured data repository (where the data repository may be organized according to a database schema or data format, such as XML).

The extracted data may be used for further processing by the computing system. For example, the computing system of FIG. 5A, while performing one or more embodiments of the invention, may perform data comparison. Data comparison may be used to compare two or more data values (e.g., A, B). For example, one or more embodiments may determine whether A>B, A=B, A!=B, A<B, etc. The comparison may be performed by submitting A, B, and an opcode specifying an operation related to the comparison into an arithmetic logic unit (ALU) (i.e., circuitry that performs arithmetic and/or bitwise logical operations on the two data values). The ALU outputs the numerical result of the operation and/or one or more status flags related to the numerical result. For example, the status flags may indicate whether the numerical result is a positive number, a negative number, zero, etc. By selecting the proper opcode and then reading the numerical results and/or status flags, the comparison may be executed. For example, in order to determine if A>B, B may be subtracted from A (i.e., A−B), and the status flags may be read to determine if the result is positive (i.e., if A>B, then A−B>0). In one or more embodiments, B may be considered a threshold, and A is deemed to satisfy the threshold if A=B or if A>B, as determined using the ALU. In one or more embodiments of the invention, A and B may be vectors, and comparing A with B requires comparing the first element of vector A with the first element of vector B, the second element of vector A with the second element of vector B, etc. In one or more embodiments, if A and B are strings, the binary values of the strings may be compared.

The computing system in FIG. 5A may implement and/or be connected to a data repository. For example, one type of data repository is a database. A database is a collection of information configured for ease of data retrieval, modification, re-organization, and deletion. Database Management System (DBMS) is a software application that provides an interface for users to define, create, query, update, or administer databases.

The user, or software application, may submit a statement or query into the DBMS. Then the DBMS interprets the statement. The statement may be a select statement to request information, update statement, create statement, delete statement, etc. Moreover, the statement may include parameters that specify data, data containers (database, table, record, column, view, etc.), identifiers, conditions (comparison operators), functions (e.g., join, full join, count, average, etc.), sorts (e.g., ascending, descending), or others. The DBMS may execute the statement. For example, the DBMS may access a memory buffer, a reference or index a file for read, write, deletion, or any combination thereof, for responding to the statement. The DBMS may load the data from persistent or non-persistent storage and perform computations to respond to the query. The DBMS may return the result(s) to the user or software application.

The computing system of FIG. 5A may include functionality to present raw and/or processed data, such as results of comparisons and other processing. For example, presenting data may be accomplished through various presenting methods. Specifically, data may be presented through a user interface provided by a computing device. The user interface may include a GUI that displays information on a display device, such as a computer monitor or a touchscreen on a handheld computer device. The GUI may include various GUI widgets that organize what data is shown as well as how data is presented to a user. Furthermore, the GUI may present data directly to the user, e.g., data presented as actual data values through text, or rendered by the computing device into a visual representation of the data, such as through visualizing a data model.

For example, a GUI may first obtain a notification from a software application requesting that a particular data object be presented within the GUI. Next, the GUI may determine a data object type associated with the particular data object, e.g., by obtaining data from a data attribute within the data object that identifies the data object type. Then, the GUI may determine any rules designated for displaying that data object type, e.g., rules specified by a software framework for a data object class or according to any local parameters defined by the GUI for presenting that data object type. Finally, the GUI may obtain data values from the particular data object and render a visual representation of the data values within a display device according to the designated rules for that data object type.

Data may also be presented through various audio methods. In particular, data may be rendered into an audio format and presented as sound through one or more speakers operably connected to a computing device.

Data may also be presented to a user through haptic methods. For example, haptic methods may include vibrations or other physical signals generated by the computing system. For example, data may be presented to a user using a vibration generated by a handheld computer device with a predefined duration and intensity of the vibration to communicate the data.

The above description of functions presents only a few examples of functions performed by the computing system of FIG. 5A and the nodes and/or client device in FIG. 5B. Other functions may be performed using one or more embodiments of the invention.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.

Claims

1. A method comprising:

receiving a frame of a video from a camera;

detecting a weapon in the frame using a weapon detection model;

classifying the weapon from the frame using a weapon match classifier;

generating a weapon alert in response to classifying the weapon; and

presenting the video in response to the weapon alert.

2. The method of claim 1, further comprising:

training one or more machine learning models that comprise the weapon detection model, the weapon match classifier, a face detection model, an embedding model, and a face match classifier.

3. The method of claim 1, further comprising:

extracting a weapon image from the frame; and

classifying the weapon using the weapon image.

4. The method of claim 1, further comprising:

generating a weapon classification using a weapon image; and

classifying the weapon using the weapon image.

5. The method of claim 1, further comprising:

selecting a plurality of frames, including the frame, from the video at a predetermined rate;

queueing the frame to a face detection queue a weapon detection queue and a frame queue;

ordering output from an event queue and the frame queue to fill a player queue with the video; and

presenting the video using the player queue and a player.

6. The method of claim 1, further comprising:

detecting a face in the frame using a face detection model; and

extracting a face image corresponding to the face from the frame.

7. The method of claim 1, further comprising:

generating a face embedding from a face image;

searching for a matching embedding using a face match classifier;

identifying a selected tree of a plurality of trees that comprises the matching embedding; and

queuing an event to an event queue in response to identifying the tree.

8. The method of claim 1, further comprising:

managing an embedding tree of a plurality of trees in response to not identifying a selected tree, of the plurality of trees, that comprises a matching embedding.

9. A system comprising:

a server comprising one or more processors and one or more memories; and

an application, executing on one or more processors of the server, configured for: receiving a frame of a video from a camera; detecting a weapon in the frame using a weapon detection model; classifying the weapon from the frame using a weapon match classifier; generating a weapon alert in response to classifying the weapon; and presenting the video in response to the weapon alert.

10. The system of claim 9, wherein the application is further configured for:

training one or more machine learning models that comprise the weapon detection model, the weapon match classifier, a face detection model, an embedding model, and a face match classifier.

11. The system of claim 9, wherein the application is further configured for:

extracting a weapon image from the frame; and

classifying the weapon using the weapon image.

12. The system of claim 9, wherein the application is further configured for:

generating a weapon classification using a weapon image; and

classifying the weapon using the weapon image.

13. The system of claim 9, wherein the application is further configured for:

selecting a plurality of frames, including the frame, from the video at a predetermined rate;

queueing the frame to a face detection queue a weapon detection queue and a frame queue;

ordering output from an event queue and the frame queue to fill a player queue with the video; and

presenting the video using the player queue and a player.

14. The system of claim 9, wherein the application is further configured for:

detecting a face in the frame using a face detection model; and

extracting a face image corresponding to the face from the frame.

15. The system of claim 9, wherein the application is further configured for:

generating a face embedding from a face image;

searching for a matching embedding using a face match classifier;

identifying a selected tree of a plurality of trees that comprises the matching embedding; and

queuing an event to an event queue in response to identifying the tree.

16. The system of claim 9, wherein the application is further configured for:

managing an embedding tree of a plurality of trees in response to not identifying a selected tree, of the plurality of trees, that comprises a matching embedding.

17. A method comprising:

training a machine learning model that incorporates a weapon detection model to detect weapons from training data;

detecting a weapon in a frame of a video using the weapon detection model;

classifying the weapon from the frame using a weapon match classifier;

generating a weapon alert in response to classifying the weapon; and

presenting the video in response to the weapon alert.

18. The method of claim 17, further comprising:

extracting a weapon image from the frame; and

classifying the weapon using the weapon image.

19. The method of claim 17, further comprising:

generating a weapon classification using a weapon image; and

classifying the weapon using the weapon image.

20. The method of claim 17, further comprising:

selecting a plurality of frames, including the frame, from the video at a predetermined rate;

queueing the frame to a face detection queue a weapon detection queue and a frame queue;

ordering output from an event queue and the frame queue to fill a player queue with the video; and

presenting the video using the player queue and a player.