SYSTEMS AND METHODS FOR MACHINE LEARNING DATA GENERATION AND VISUALIZATION
A system for machine learning data generation and visualization comprises a processor configured to generate a queue module that receives a data file pertaining to a problem to be addressed using a machine learning model, a feature selector module configured to select features extracted from the data file, a vectorizing module configured to generate vectorized feature data from the features, a feature generation module configured to generate data features with reduced dimensionality from the vectorized data using autoencoding techniques, a model handler module configured to select a machine learning model to analyze the data features with reduced dimensionality, to transmit the model for execution, and to receive the results of the execution, a visualizer module configured to parse a dimensionality of the results and select a visualization approach based on the dimensionality, and an output module configured to provide the results for rendering the visualization approach.
The present invention relates to information technology (IT) security, and, more particularly, relates to a system and method for machine learning data generation and visualization.
BACKGROUND OF THE DISCLOSUREArtificial intelligence and machine learning (AI/ML) techniques are currently being employed in numerous applications in a wide range of fields. Recently, AI/ML software platforms have been developed that automate data processing procedures to afford an operator a degree of optionality and control over how machine learning models are trained, and how data is to be analyzed. For example, some platforms provide control over how input data is formatted and provide some choice as to the selection of machine learning algorithms and hyperparameters.
However, the platforms deployed to data suffer from various types of inflexibility in their ability to handle different types of input data, in their ability to generate features from the data, in their ability to apply a range of machine learning techniques and parameters, and in their ability to provide visualizations that enable operators to better analyze the results of AI/ML modelling. Owing to a lack of comprehensive flexibility, determining the optimal features, model, and parameters for a given AI/ML problem can be challenging.
SUMMARY OF THE DISCLOSUREThe present disclosure describes a non-transitory computer-readable medium comprising instructions which, when executed by a computer system, cause the computer system to carry out a method of machine learning data generation and visualization. The method includes steps of receiving a data file containing data pertinent to a problem to be addressed using a machine learning model, extracting features from the data file, vectorizing the extracted features using a plurality of vectorization techniques into vectorized feature data, generating data features with reduced dimensionality from the vectorized feature data using a plurality of autoencoding techniques, selecting an artificial intelligence/machine learning (AI/ML) model to analyze the data features with reduced dimensionality, receiving results of an execution of the selected AI/ML model, parsing a dimensionality of the received results, selecting a visualization approach for the received results based on the dimensionality and outputting the selected visualization of results of the execution of the selected AI/ML model.
In another aspect, the present disclosure describes a system for machine learning data generation and visualization. The system comprises one or more processors, the processors having access to program instructions that when executed, generate the following modules, a queue module configured to receive a data file pertaining to a problem to be addressed using a machine learning model, a feature selector module configured to select features extracted from the data file, a vectorizing module configured to generate vectorized feature data from the features selected by the feature selector module using a plurality of vectorization techniques, a feature generation module configured to generate data features with reduced dimensionality from the vectorized feature data using a plurality of autoencoding techniques, a model handler module configured to select an artificial intelligence/machine learning (AI/ML) model to analyze the data features with reduced dimensionality, to transmit the model for execution, and to receive the results of the execution of the selected AI/ML model, a visualizer module configured to parse a dimensionality of the results obtained by the model handler module and to select a visualization approach for the obtained results based on the dimensionality, and an output module configured to provide the results to a device for rendering the visualization approach selected by the visualizer module.
Disclosed herein is a comprehensive artificial intelligence/machine learning (AI/ML platform that provides enhanced visualization features. The platform provides several distinct state machine modules that perform tasks including data collection, transfer, identification, recursive extraction, feature identification & selection, vectorization, and auto-encoding. These steps are preparatory to and are used as inputs to further modules that perform machine learning feature extraction, algorithm selection, model structure and hyperparameter selection, model training and prediction and result visualization. The platform machine learning models are used to train classifiers. For example, in one application, machine learning modules can be trained to classify incoming email or URL data as suspicious versus non-suspicious; in another application, machine learning modules can be trained to identify features in graphical or audio data. All of these steps are performed so as to have operator supervision in the event that fine tuning is desired of various data inputs and parameters used by the modules to optimize prediction accuracy. Importantly, the steps can be varied, repeated and compared to other model executions to help operators better understand and visualize changes in selections that improve results for particular classification problems.
At the outset it is noted that the term “module”, used in the description and accompanying figures, is defined as program code and associated memory resources, that when read and executed by a computer processor, perform certain defined procedures. For example, an “vectorizer module” comprises program code that when executed by a computer processor, performs procedures related to vectorization of data.
Hereinafter the category of techniques encompassed by AI/ML will be referred to collectively for convenience as “machine learning”; it is to be understood that “machine learning” in this context therefore can include artificial intelligence techniques that are normally not classified as machine learning techniques per se.
Referring to
The relevant source data files 104 that can comprise a wide variety of original source types. Example source data 104 files can include one or more of files or byte streams in which relevant data is directly present or embedded as part of some process or function. The relevant data can include textual (alphanumeric), graphic, audio information, and combinations thereof. It is noted that the relevant source data can be presented in an obscured form and can be embedded, encrypted, or otherwise obfuscated. These techniques to obscure or hide data can be taken into account in various machine learning pattern identification algorithms disclosed herein.
Source data obtained from the device sources by the collector module 112 is passed to a cache module 114. The cache module 114 is configured to execute a hash function, such as MD5, SHA1, SHA2, etc., to uniquely identify each file received from the collector module 112. Once a file hash is computed, the cache module 114 performs a lookup of the hash in cache memory to see if the file has been analyzed before. If the hash is found in the lookup procedure, then a response is provided, allowing the cache module to discard the currently reviewed file. Otherwise, the file hash is stored and the file is passed to an encoder module 116 for encoding. The operations of the cache module 114 prevents duplication of effort by avoiding analyzing the same file more than once.
Returning to
A user interface 125 also interacts with the API 124 of the central node. The user interface 125 enables operators to submit files and requests directly to the API 124 and enables user control and monitoring of processes of the central node. More generally, the API 124 includes program code that when executed manages traffic between the end users and the rest of the machine learning data generation and visualization system.
The queue module 122 temporarily stores submitted files to maintain an ordered flow of analysis procedures. For instance, if numerous analysis requests are received within a short span of time, the queue module 122 can provide for a first-in first-out (FIFO), last-in first-out (LIFO) or other known method for both ensuring that the system does not get overloaded and that every submission is processed. In addition, another cache queue module de-queues files from the queue and passes the file to a decoder module for decoding (both the cache and decoder modules are not shown for ease of illustration). The decoder module decodes the module using standard byte stream based XOR with a key or symmetric encryption with a key and passes it to back to the cache module. The cache module analyzes the file for duplicate effort as noted above. If the file has not been analyzed, the file artifact is passed back to the queue 122 until the file is de-queued by the analytic module 130.
Submissions are delivered from the queue module 122 in an orderly flow to the analytic module 130 of the system which encompasses a number of sub-modules that perform various pre-processing on the retrieved files to prepare data suitable as input to various machine learning algorithms. The first sub-module of the analysis node is an identifier module 132. The identifier module 132 is configured to analyze file artifacts as a byte stream and to identify the contents of the file as a specific type with a specific format. Additionally, the identifier module 132 is configured to interrogate the file internally utilizing various methods such as byte-stream based “magic header” matching via tables of known file signatures, format indicators, machine and human linguistic syntax analysis to further analyze the file for various characteristics. These techniques are used to further identify embedded files, objects, streams, text data, general executable byte-code patterns, and random or encrypted byte patterns that can be present in the file. Identifications are stored in a central intelligence database 150 via an intermediate memory cache 145.
As the embedded links are identified, the file is passed to a recursive extractor module 134 (“recursive extractor”) that is configured to extract the embedded items from the file recursively. The recursive extractor 134 continues to break down the file into component parts or artifacts until all embedded artifacts have been extracted and no further meaningful data can be obtained from the original file (i.e., the file has been broken down into its minimal constituent elements). One way this can be determined is when an extraction step yields the same artifacts and data as a previous extraction step, indicating that no further artifacts can be yielded from the file. Once each file is reduced down to a non-reducible level, it is passed to a metadata extractor module 136 (“metadata extractor”) that is configured to extract any additional metadata from the file and artifacts such as, but not limited to, links, string patterns, byte-code patterns, magic identifiers, author, creation timestamps, modification timestamps, programming language syntax identification, human language identification, domains, IP addresses, MAC addresses, geo-location identifiers, phone numbers, physical addresses, etc. The extracted metadata is stored in the central database 150. From the metadata extractor 136, the file and artifact data are passed to an additional query sub-module 138. The query module 138 is communicatively coupled to the central database 150 and to other external sources of relevant data. The external sources are collectively represented and referred to as the Intel database 160. The query module 138 collects all results obtain from the queries into a single dataset or multiple datasets for feature selection.
The feature selector module 140 is configured to select data sets or data points from within the newly collected data sets obtained by the query module 138 and to establish a sub-set of data sets or data points for analysis by a vectorizer module 344.
The different vectorization methods can be executed simultaneously or in series, and the vectorizer module 142 can be configured to execute all of the method or only a subset of them depending on operator input. All vectorizations are stored in the central database 150 and made available for analysis by following modules, by operators, in some instances, and more generally for future correlations and analyses. The vectorized data sets output by the vectorizer module are provided to the feature generator module 144.
The purpose of the feature generator module 144 is to convert the input vectorized into the best available features that are most unique, prominent, or of most value in making a decision on the dataset when being trained and predicted on by the model to achieve the model's objective. In this sense, the various autoencoding approaches each have a different transformative effect upon the input vectorized data. The combination of approaches can be used collectively to derive features that can be used in the training and testing process to achieve high value models that are able to achieve their objective with highest degree of confidence.
More specifically, the sparse autoencoder 402 employs a loss function on the input vectorized data that is constructed so that activations are penalized within a layer, which has the effect of favoring fewer layers and reducing the dimensionality of the input data. A sparsity constraint can be imposed with L1 regularization or a KL divergence between expected average neuron activation to an ideal distribution. The denoising autoencoder 404 randomly converts some of the input vectorized data to zero in order to avoid undesired outputs of the identity and null functions. In other words, the denoising autoencoder helps avoid the feature generator arriving at features that are equivalent to the input data and are not helpful in model prediction. The contractive autoencoder 406 is configured by code to avoid overfitting to the vectorized input data by adding a regularizer (penalty) term to whatever cost function is being minimized. Like the sparse autoencoder, the penalty favors the generation of features have fewer parameters than the input data due to the penalties imposed on the weightings of the parameters. The variational autoencoder 408 introduces regularization to avoid overfitting in another way, by encoding input values as distributions rather than as unique values. The variational autoencoder 408 also typically generates compressed data with fewer parameters than the input data due to the manner in which input values are represented. All feature sets generated by autoencoders 402-408 are stored in the central database 150 and made available ingestion by following modules, as well as for current and future operations and analysis by operators.
After features have been generated by the autoencoders 402-408 of the feature generation module 144, the vectorized, compressed data is passed to a model handler module 146. The model handler module 146 generates one or more models based on user-input configuration schema. The models comprise a model structure, hyperparameters, and specific algorithms. The model handler module 146 delivers the set parameters of the selected model(s) to a machine learning operational systems provider 160 (ML implementer) to implement the models for training or prediction. The “models” referred to here are artificial intelligence or machine learning algorithms. Such models can include, but are not limited to, Bayesian, k-Nearest Neighbor (kNN), Support Vector Machines (SVM), and deep learning networks such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), Long Short-Term Memory Networks (LSTMs), Adaboost, Gradient Boosting Machines.
Many of the models are supervised machine learning algorithms (or combinations thereof). Supervised machine learning algorithms employ forward and backward propagation, a loss function and an optimization algorithm such as gradient descent to train a classifier. In each iteration of the optimization algorithm on training data, outputs based on estimated feature weights are propagated forward and the output is compared with data that has been classified (i.e., which has been identified by type). The estimated weights are and then modified during backward propagation based on the difference between the output and the tagged classification as a function of the code used to implement this aspect of the ML algorithm. This occurs continually until the weights are optimized for the training data. Generally, the machine learning algorithm is supervised meaning that it uses human-tagged or classified data as a basis from which to train. However, in a prefatory stage, a non-supervised classification algorithm can be employed for initial classification as well.
In one exemplary embodiment, shown in
These modules 602-608 can present different menus and graphical features through the user interface 125 to aid the user in selecting various features and variables of a machine learning algorithm to test the received data. It should be understood that in alternative embodiments, the functionality of the different modules can be combined in fewer sub-modules or distributed among a larger number of sub-modules. Each module uses the configuration parameters to determine selections for use, including mode, mixture of selections, and variations (i.e., of vectorizing and feature generation techniques, and machine learning model). The configuration parameters also determine when a model passes (successfully meets a threshold) or runs out of time. Configuration parameters are stored as JSON (JavaScript Object Notation) objects for each project or session. In general, the configuration parameters provide guidance and limits as to the approach taken by each module in succession in order to limit the extent of resources being utilized.
The model handler 146 can be configured to select model structures and hyperparameters according to different modes. In a brute force mode, the handler permutes across a set range of all possible values appropriate to each selected model. In a second mode, a range is preselected, and the model handler selects only values from within the set range of values for model hyperparameters, structure, layers, etc. per each machine learning approach and selected models. In addition, a time limit for model evaluation can be set by the operator, which limits the computations of the possible structure and hyperparameter values. The operator can select among the values computed prior to the time limit. The values are dependent on the machine learning approach taken, such as Bayesian, Multi-Variate Bayesian, KNN, SVM, and many others within the Deep Learning approaches.
The operational executor 608 delivers all of the data input regarding the selected model(s) to machine learning operational systems 160 (“ML implementer”), which can be local or cloud-based. Once executed, the ML systems return the results of the training or prediction to the operational executer 608. The operational executor 608 is configured to analyze the outputs of the ML implementer 160. Based on the analysis, the operational executor 608 determines if the training meets threshold criteria configurable by the operator. The threshold criteria typically pertain to the measured accuracy of a model in identifying and classifying the input data. If the training does not meet the set criteria (i.e., is not sufficiently accurate), the operational executor 608 in configured by code to initiate an additional round of feature selection starting at the query module 138. Alternatively, if the threshold criteria are met, the operational executor 608 is configured by code to accept the results and deliver them onward for output and monitoring. Over time, the model handler 146 as a whole can generate numerous different models for the ML implementer 160 to train, and the results of the different models can be analyzed and compared.
The operational executor 608 is configured to evaluate whether the model's results meet the criteria to be declared a useful or successful model. These are based on accuracy, balanced accuracy, precision, recall, and variations of the confusion matrix. Variations of the confusion matrix can include Mathew's Correlation Coefficient (MCC), True Positive/Negative rates, Precision Positive/Negative Predictive rates, Fowlkes-Mallow index, informedness, markedness (delta-p), etc. and models with the highest ratings, based on metrics set by the operator are deemed useful or successful models. Models with the highest ratings or top-n models can be configured to be selected as the “winner” models.
In the embodiment depicted in
The area below the performance and model approach elements includes a set of control elements that enable the operator to configure the modules discussed above and other settings. For example, a vectorizer control element 730 enables the operator to select, activate or disable one or more vectorization operations. A feature generator control element 735 enables the operator to select, activate or disable one or more autoencoder algorithms. A model structure element 740 enables the operator to select, activate or disable one or more of selecting structures and hyperparameters. A training and testing control element 745 enables the operator to set options for displaying performance, among other functions, and a resources control element 750 enables the operator to set an extent of computational resources to be allocated to the training and testing procedures of the project.
It is to be understood that the user interface screen 600 is only one of many different screen through which the user provides inputs for configuring, controlling and monitoring the numerous parameters and options available. For example, there will be a different user interface screen presented for each type of machine learning algorithm, as each algorithm requires different inputs, parameters and settings. The interface of
All output data, including visualized or general information output related to data sets, data vectorization, data set feature extractions, model selection, model structure, model hyperparameters, model and algorithm performance measures and metrics are handled by the output handler 168. In the embodiment depicted in
The system described above can be used for the application of machine learning in a systematic way to solve or shed light on a vast variety of problems. At the outset, it is not necessarily known which data set, data features, algorithmic approach, and model structure would be most effective. The training results provided by the visualizer 164 and output modules 168 according to this disclosure, however, identify important datasets and dataset features. This output informs the operators and can directly influence the selected algorithmic approach, the model structure and hyperparameters. By identifying the appropriate visualization approach that is suitable for the dimensionality of the data, the system better enables operators to evaluate multiple machine learning models to find the best collection of datasets, features, algorithms, and models for identification and classification. For example, certain types of machine learning models might be optimal for classifying certain types of data. In any event, differences in outcomes provide insight to the monitoring operators with respect to the pertinent problem.
The disclosed systems and methods provide an end-to-end approach to solving problems utilizing machine learning in a broad and flexible manner. The systems and methods include intelligence database integration and correlation, fuzzing of feature selections, and various output visualizations and API functionality for integration with other cybersecurity operational systems.
Organizations can utilize the disclosed system and methods to collect datasets, transform data, and identify essential and important features that models use to solve a specific problem using machine learning. The solution helps identify the optimal datasets, features, algorithms, model structure, and model hyperparameters which perform best in solving specific use-case problems. Additionally, the disclosed systems and methods, when implemented, can be utilized for full machine learning lifecycle development, testing, training, and operationalization, including model retraining, model retention, model bias, and model decay over time.
There are many types of applications to which the machine learning system of the present disclosed can be gainfully applied. For example, uses in the cybersecurity field in include domain look-alike (doppelganger) identification, anomaly detection across user behaviors, anomaly detection on logs, anomaly detection on network behaviors, anomaly detection on a sinkhole (where it is collecting data—like a blackhole on the network), and authentication-based anomaly detection.
Another useful function that this system and method provides is in optimizing other machine learning processes. For example, the disclosed system can be used to assess and retrain existing models, or to replace existing models entirely with new models that have proven to be better predictors for a specific problem or use case. Normally, as models are trained utilizing a specific set of data relevant to the context of an organization's environment, they are useful and effective. But over time, as the datasets, users, user behavior patterns, adversarial patterns, tools and tactics change. This can render models that were previously successful predictors ineffective. Having a system that can continuously ingest existing data sets, vectorize, featurize, and test numerous possible models across various approaches allows it to quickly identify better vectorization approaches, feature generation approaches, better ML approaches, with models with better hyperparameters and structures allowing for the model to be retrained or replace entirely to meet a project objective. In this case, the model handler can pull and push models from the Machine Learning operational systems (AI/ML platform), and retrain, replace, or augment an existing model to cover other gaps, in order to make the entirety of the approach meet the project prediction objectives.
It should be understood that all of the system components described herein such as collector nodes, analysis modules, etc. are embodied using computer hardware (microprocessors, parallel processors, solid-state memory or other memory, etc.), firmware and software as understood by those of skill in the art and can include servers, workstations, mobile computing devices, as well as associated networking and storage devices. Communications between devices can occur over wired or wireless communication media and according to any suitable communications system or protocol.
It is to be understood that any structural and functional details disclosed herein are not to be interpreted as limiting the systems and methods, but rather are provided as a representative embodiment and/or arrangement for teaching one skilled in the art one or more ways to implement the methods.
It is to be further understood that like numerals in the drawings represent like elements through the several figures, and that not all components and/or steps described and illustrated with reference to the figures are required for all embodiments or arrangements
The terminology used herein is for describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising”, when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Terms of orientation are used herein merely for purposes of convention and referencing and are not to be construed as limiting. However, it is recognized these terms could be used with reference to a viewer. Accordingly, no limitations are implied or to be inferred.
Also, the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of “including,” “comprising,” or “having,” “containing,” “involving,” and variations thereof herein, is meant to encompass the items listed thereafter and equivalents thereof as well as additional items.
While the invention has been described with reference to exemplary embodiments, it will be understood by those skilled in the art that various changes can be made and equivalents can be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications will be appreciated by those skilled in the art to adapt a particular instrument, situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but that the invention will include all embodiments falling within the scope of the appended claims.
Claims
1. A non-transitory computer-readable medium comprising instructions which, when executed by a computer system, cause the computer system to carry out a method of machine learning data generation and visualization, the method including steps of:
- receiving a data file containing data pertinent to a problem to be addressed using a machine learning model;
- extracting features from the data file;
- vectorizing the extracted features using a plurality of vectorization techniques into vectorized feature data;
- generating data features with reduced dimensionality from the vectorized feature data using a plurality of autoencoding techniques;
- selecting an artificial intelligence/machine learning (AI/ML) model to analyze the data features with reduced dimensionality;
- receiving results of an execution of the selected AI/ML model;
- parsing a dimensionality of the received results;
- selecting a visualization approach for the received results based on the dimensionality;
- outputting the selected visualization of results of the execution of the selected AI/ML model.
2. The non-transitory computer readable medium of claim 1, further comprising instructions which, when executed by a computer system, cause the computer system to execute the steps, prior to vectorization, of:
- recursively extracting data embedded in the data file; and
- extracting meta-data from the data file and artifacts obtained from recursive extraction.
3. The non-transitory computer readable medium of claim 1, further comprising instructions which, when executed by a computer system, cause the computer system to execute the steps, prior to vectorization, of performing a query on a database based on the data in the file and extracted meta-data.
4. The non-transitory computer-readable medium of claim 1, wherein the method further comprises, after selecting a visualization approach and before outputting the selected visualization of results, transforming and structuring the results of the execution of the selected AI/ML model for the selected visualization approach.
5. The non-transitory computer-readable medium of claim 4, wherein the visualization approach includes one or more of a histogram, a bar chart, a pie chart, a plot, a line plot, a time series plot, a relationship map, a heat map, a geo-tagged or geo-location-based map, a three-dimensional map, an animation, a syntax-based plot, and a word-based plot.
6. The non-transitory computer-readable medium of claim 1, wherein the plurality of vectorization techniques includes direct vectorization, meta-enhanced vectorization and fuzzy vectorization.
7. The non-transitory computer-readable medium of claim 1, wherein the plurality of autoencoding techniques include sparse, denoising, contractive, and variational autoencoding.
8. The non-transitory computer-readable medium of claim 1, wherein the selected AI/ML model comprises a supervised machine learning model.
9. The non-transitory computer-readable medium of claim 8, wherein the model handler includes a hyperparameter selector module for enabling selection of parameters for execution of the selected supervised machine learning model including at least one of a learning rate, a number of epochs and a batch size.
10. A system for machine learning data generation and visualization comprising:
- one or more processors, the processors having access to program instructions that when executed, generate the following modules:
- a queue module configured to receive a data file pertaining to a problem to be addressed using a machine learning model;
- a feature selector module configured to select features extracted from the data file;
- a vectorizing module configured to generate vectorized feature data from the features selected by the feature selector module using a plurality of vectorization techniques;
- a feature generation module configured to generate data features with reduced dimensionality from the vectorized feature data using a plurality of autoencoding techniques;
- a model handler module configured to select an artificial intelligence/machine learning (AI/ML) model to analyze the data features with reduced dimensionality, to transmit the model for execution, and to receive the results of the execution of the selected AI/ML model;
- a visualizer module configured to parse a dimensionality of the results obtained by the model handler module and to select a visualization approach for the obtained results based on the dimensionality; and
- an output module configured to provide the results to a device for rendering the visualization approach selected by the visualizer module.
11. The system of claim 10, further comprising:
- a recursive extractor module configured to recursively extract data embedded in the data file; and
- a meta-data extractor module configured to extract metadata from the file and artifacts obtained from the recursive extractor module.
12. The system of claim 11 further comprising a query module configured to performing a query on a database based on the data in the file and extracted meta-data.
13. The system of claim 10, wherein the visualizer module is further configured to transform and structure the results of the execution of the selected AI/ML model for the selected visualization approach after selecting a visualization approach and before outputting the selected visualization of results.
14. The system of claim 13, wherein the visualization approach selected by the visualizer module includes one or more of a histogram, a bar chart, a pie chart, a plot, a line plot, a time series plot, a relationship map, a heat map, a geo-tagged or geo-location-based map, a three-dimensional map, an animation, a syntax-based plot, and a word-based plot.
15. The system of claim 10, wherein the vectorizer module is configured to vectorize feature data using direct vectorization, meta-enhanced vectorization and fuzzy vectorization.
16. The system of claim 10, wherein the feature generation module is configured to generate URL data features using sparse, denoising, contractive, and variational autoencoding.
17. The system of claim 10, wherein the selected AI/ML model selected by the model handler module comprises a supervised machine learning model.
18. The system of claim 13, wherein the model handler module includes a hyperparameter selector that is to receive parameters for execution of the selected supervised machine learning model including at least one of a learning rate, a number of epochs and a batch size.
Type: Application
Filed: Feb 26, 2021
Publication Date: Sep 1, 2022
Inventor: Aminullah Sayed Tora (Dhahran)
Application Number: 17/187,469