MULTI-API METRIC MODELING USING LSTM SYSTEM
The present system models multiple application program interfaces(APIs) and determines anomaly behavior for the group of APIs. The system APIs are monitored and data is collected for the multiple APIs. Metrics are generated for the APIs and reported to an application. The metrics are a raw timeseries stream of metrics and are transformed to a different domain for processing. In some instances, the raw time series metric data is smoothed or averaged into an average domain. A model receives the smooth time series metric data, a pilot signal, and homogeneous signal inputs. The model may include an LSTM model or some other model. The LSTM model may output data to a neural network, which then provides output of a predicted value of the metrics, current value of the metric, and a regenerated pilot signal. A determination is made as to whether the neural network system predicts the pilot signal correctly, and if so the predicted metric is compared to the actual metric. If the metrics do not match, a degree of anomaly is determined and reported. In some instances, whether an anomaly is reported to the group of APIs depends on how many anomalies are detected in the severity of each anomaly.
Latest Traceable Inc. Patents:
The present application claims the priority benefit of U.S. provisional patent application 63/167,649, filed on Mar. 30, 2021, titled “INTELLIGENT APPLICATION PROTECTION,” the disclosure of which is incorporated herein by reference.
BACKGROUNDWhen analyzing, tracking, or monitoring a system, it is often important to know the how one or more application program interfaces (APIs) are performing. Many systems have hundreds or thousands of APIs. To monitor and track the performance of each API can require large amounts of resources and time. What is needed is an improved mechanism for monitoring and detecting anomalous behavior for multiple APIs.
SUMMARYThe present system models multiple application program interfaces(APIs) and determines anomaly behavior for the group of APIs. The system APIs are monitored and data is collected for the multiple APIs. Metrics are generated for the APIs and reported to an application. The metrics can include a raw timeseries stream of metrics and can be transformed to a different domain for processing. In some instances, the raw time series metric data is smoothed or averaged into an average domain. A model receives the smooth time series metric data, a pilot signal, and homogeneous signal inputs. The model may include an artificial recurrent neural network model, such as for example a long short-term (LSTM) model, or other model. The LSTM model may output data to a neural network, which can provide output of a predicted value of the metrics, current value of the metric, and a regenerated pilot signal.
Once the predictions are generated, a determination can be made as to whether the system predicts a pilot signal correctly, and whether the predicted metric is compared to the actual metric. If the metrics do not match within a particular margin or threshold, a degree of anomaly is determined and reported. In some instances, whether an anomaly is reported to the group of APIs depends on how many anomalies are detected in the severity of each anomaly.
In some instances, a method automatically forecasts values for system metric time series. The method begins with receiving raw time series metric data by an application on an application server, the raw time series metric data associated with an API provided by a remote server, the raw time metric data associated with request and responses to the API from a plurality of remote devices in communication with the remote server. The method continues with transforming, by the application on the application server, the received raw time series metric data into smoothed time series metric data. The smoothed time series metric data are passed through a prediction model, the prediction model trained using training data prior to receiving smoothed time series metric data. One or more homogeneous variables associated with the API are passed through the prediction model. The method continues with generating a prediction for the time series metric data in a smoothed format, the prediction based on the smoothed time series metric data and the one or more homogeneous variables associated with the API. The smoothed predicted time series metric data are then transformed into a raw predicted time series metric value. The raw predicted time series metric values are analyzed to determine whether raw time series metric data is anomalous.
In some instances, a non-transitory computer readable storage medium has embodied thereon a program that is executable by a processor to perform a method. The method automatically forecasts values for system metric time series. The method begins with receiving raw time series metric data by an application on an application server, the raw time series metric data associated with an API provided by a remote server, the raw time metric data associated with request and responses to the API from a plurality of remote devices in communication with the remote server. The method continues with transforming, by the application on the application server, the received raw time series metric data into smoothed time series metric data. The smoothed time series metric data are passed through a prediction model, the prediction model trained using training data prior to receiving smoothed time series metric data. One or more homogeneous variables associated with the API are passed through the prediction model. The method continues with generating a prediction for the time series metric data in a smoothed format, the prediction based on the smoothed time series metric data and the one or more homogeneous variables associated with the API. The smoothed predicted time series metric data are then transformed into a raw predicted time series metric value. The raw predicted time series metric values are analyzed to determine whether raw time series metric data is anomalous.
In embodiments, a system can include a server, memory and one or more processors. One or more modules may be stored in memory and executed by the processors to receive raw time series metric data by an application on an application server, the raw time series metric data associated with an API provided by a remote server, the raw time metric data associated with request and responses to the API from a plurality of remote devices in communication with the remote server, transform, by the application on the application server, the received raw time series metric data into smoothed time series metric data, pass the smoothed time series metric data through a prediction model, the prediction model trained using training data prior to receiving smoothed time series metric data, pass one or more homogeneous variables associated with the API through the prediction model, generate a prediction for the time series metric data in a smoothed format, the prediction based on the smoothed time series metric data and the one or more homogeneous variables associated with the API, transform the smoothed predicted time series metric data into a raw predicted time series metric value, and analyze the raw predicted time series metric value to determine whether raw time series metric data is anomalous.
The present system models multiple application program interfaces(APIs) and determines anomaly behavior for the group of APIs. The system APIs are monitored and data is collected for the multiple APIs. Metrics are generated for the APIs and reported to an application. The metrics can include a raw timeseries stream of metrics and can be transformed to a different domain for processing. In some instances, the raw time series metric data is smoothed or averaged into an average domain. A model receives the smooth time series metric data, a pilot signal, and homogeneous signal inputs. The model may include an artificial recurrent neural network model, such as for example a long short-term (LSTM) model, or other model. The LSTM model may output data to a neural network, which can provide output of a predicted value of the metrics, current value of the metric, and a regenerated pilot signal.
Once the predictions are generated, a determination can be made as to whether the system predicts a pilot signal correctly, and whether the predicted metric is compared to the actual metric. If the metrics do not match within a particular margin or threshold, a degree of anomaly is determined and reported. In some instances, whether an anomaly is reported to the group of APIs depends on how many anomalies are detected in the severity of each anomaly.
Client devices 110-140 may send API requests to and receive API responses from customer server 150. The client devices may be any device which can access the service, network page, webpage, or other content provided by customer server 150. Client devices 110-140 may send a request to customer server 150, for example to an API provided by customer server 150, and customer server 150 may send a response to the devices based on the request. The request may be sent to a particular URL provided by customer server 150 and the response may be sent from the server to the device in response to the request. Though only for four client devices are shown, a typical system may handle requests from a larger number of clients, for example, dozens, hundreds, or thousands, and any number of client devices may be used to interact with customer server 150.
Customer server 150 may provide a service to client devices 110-140. The service may be accessible through APIs provided by customer server 150. Agent 152 on customer server 150 may monitor the communication between customer server 150 and client devices 110-140 and intercept traffic transmitted between the server and the devices. Upon intercepting the traffic, agent 152 may forward the traffic to application 172 on application server 170. In some instances, one or more agents may be installed on customer server 150, which may be implemented by one or more physical or logical machines. In some instances, server 150 may actually be implemented by multiple servers in different locations, providing a distributed service for devices 110-140. In any case, one or more agents 152 may be installed to intercept API requests and responses between devices 110-140 and customer server 150, in some instances may aggregate the traffic by API request and response data, and may transmit request and response data to application 172 on server 170.
Network 140 may include one or more private networks, public networks, intranets, the Internet, an intranet, wide-area networks, local area networks, cellular networks, radio-frequency networks, Wi-Fi networks, any other network which may be used to transmit data, and any combination of these networks. Client devices 110-140, customer server 150, Application server 170, and data store 180 may all communicate over network 160 (whether or not labeled so in
Application server 170 may be implemented as one or more physical or logical machines that provide application functionality as described herein. In some instances, application server may include one or more applications 172. The application 172 may be stored on one or more application servers 170 and be executed to perform functionality as described herein. Application server and application 172 may both communicate over network 160 and data store 180. Application 172 is discussed in more detail with respect to
Data store 180 may be accessible by application server 170 and application 172. In some instance, data store 180 may include one or more APIs, API descriptions, metric data, and other data discussed herein.
Metric data 220 may include metrics that are generated from data collected from service APIs. The generated metrics may include an API response time, API errors per time period, API calls per time period, and other metrics. In some instances, one of more of the collected metrics can be averaged by metric data 220. Metric data 220 can also concatenate metrics and standardize metrics.
Anomaly detection 230 may determine if metric data should be identified as an anomaly. Determining whether metric data may be an anomaly may include comparing a predicted metric to an actual metric value, determining a discrepancy between the actual and predicted value, and then categorizing an API as not anomalous, slightly, very, or extremely anomalous. In some instances, for a service having many APIs such as for example a process for progressing through adding a product to a cart and performing checkout for an ecommerce service, one model may be used for a plurality APIs of the particular service, and the anomaly detection may be used for the service APIs. For example, one model as discussed herein (e.g., an LSTM and neural network pair) may be used to determine whether a set or family of APIs associated adding a product to a cart and performing checkout is anomalous.
Homogeneous data 240 may collect and analyze homogeneous inputs received from a system having multiple APIs. In some instances, the homogeneous signal inputs may serve as additional layers to an Alice TM model. Examples of homogeneous signal inputs may include a request size signal for an API, a request key signal for an API, a response size signal for an API, and a response key signal or an API.
Model 250 may implement a model for processing inputs to predict a metric value or values. In some instances, model 250 may be a LSTM model. In any case, the model 250 may receive smoothed signals, a pilot signal, and homogeneous signal inputs, and may process these inputs to provide an output to a neural network 260. In some instances, the model may implement an automated process, that is trained through machine learning over time.
Neural network 260 may receive the output of the model and predict a pilot signal, as well as perform other functionality described herein. The neural network system prediction of the pilot signal may determine if the output of the model is used for anomaly analysis. In some instances, the neural network may implement an automated process, that is trained through machine learning over time.
A raw timeseries data stream of metrics is provided to an application on a server at step 325. The raw timeseries stream may be provided periodically, such as every 5, 10, or 30 seconds, every minute, based on an event such as a timer or a push or pull message, or in some other mariner.
The raw timeseries metric data may be transformed into a smooth timeseries metric data stream for each service API or group of APIs at step 330. The raw timeseries metric data, existing in a first domain, is transformed into a second domain associated with smoothed data. In some instances, the smoothed data may be generated from the raw timeseries metric data by taking the moving average of the raw timeseries metric data. In some instances, other techniques may be used to smooth the data.
The smoothed signals are received by a predictive model at step 335. In some instances, the received smoothed signals are for past metric values up to a current time. The predictive model may be implemented by a recurrent neural network model, such as an LSTM model. A pilot signal is then received by the model at step 340. The pilot signal identifies a particular API or set of APIs for which the metric data pertains to.
Homogeneous signal inputs are received at step 345. The homogeneous signal inputs can serve as additional layers to be processed by the model at the current time. The homogeneous signal inputs are inputs that are typically the same for each request and response for a particular API. More details for receiving homogeneous signal inputs are discussed with respect to the method of
The LSTM model output is then provided to a neural network input at step 350. The model outputs can include metrics, predicted values, and a regenerated pilot signal, and are generated at step 355. The model may output a predicted metric at the current time T, predicted values up until a time T plus K, and regenerated pilot signals. The model outputs are generated based on the smoothed signal, homogeneous signals, and the pilot signal. In some instances, weights within the model are optimized such that different way combinations may cater to different APIs. For example, the model and neural network may collectively imitate a neural network per API source that is embedded within the LSTM architecture. The model and a neural network may have parameters that are determined using hyper parameter tuning, for which the system uses Bayesian optimization.
A determination is made as to whether the neural network system prediction matches the original pilot signal value at step 360. If the predicted pilot signal does not match a received pilot signal, then the metric prediction is discarded at step 365. The metric data is discarded because the generated data cannot be identified to match the source API from which the metric data was generated.
If the predicted pilot signal value matches the received pilot signal, the smoothed timeseries metric data is transformed back into a raw timeseries metric data stream at step 370. Transforming the smooth timeseries metric data back into the raw timeseries metric data can be used with the same technique as when transferring the raw data to the smoothed signal domain.
A determination of whether the timeseries metric data is anomalous is an automatically performed at step 375. Automatically determining whether the timeseries data is anomalous may include comparing actual metric values to predicted met metric values, and processing the discrepancies between the two values. Automatically determining whether a timeseries metric data is anomalous at step 375 is discussed in more detail below with respect to the method of
An average of the metrics is calculated at step 440. In some instances, the average API response time, API errors per time period, and API calls per time period can be calculated as an average over the same time period. In some instances, the API metrics may be averaged over different time periods. The averaged metrics may be concatenated at step 450. Data sets may then be standardized at step 460.
A determination is made as whether the discrepancy between the metric values indicates the majority of API anomalies at step 620. In some instances, if there is no large discrepancy between the values, the API may be categorized is not anomalous. In some instances, multiple groups of APIs can be predicted to determine if a set of APIs or an API family should be considered anomalous. The present system, in some instances, tries to minimize alerts sent to system administrators unless the majority of APIs in an API family are anomalous and a significant concern exists. In some instances, if the discrepancy between the values is greater than 40%, 50%, or 60%, the discrepancy is sufficient to trigger an anomaly.
If the discrepancy between the values indicates a majority of APIs and an API family are anomalies, the API is categorized as anomalous based on the size of the discrepancy at step 640. For example, an API family may be categorized as slightly anomalous, very anomalous, or extremely anonymous. The different anomalous levels may be based on the standard deviation. For example, a first deviation may be associated with no anomaly, a second deviation may be associated with a slight anomaly, a third deviation may be associated with very anomalous, and a fourth deviation may be associated with extremely anomalous.
The components shown in
Mass storage device 730, which may be implemented with a magnetic disk drive, an optical disk drive, a flash drive, or other device, is a non-volatile storage device for storing data and instructions for use by processor unit 710. Mass storage device 730 can store the system software for implementing embodiments of the present invention for purposes of loading that software into main memory 720.
Portable storage device 740 operates in conjunction with a portable non-volatile storage medium, such as a floppy disk, compact disk or Digital video disc, USB drive, memory card or stick, or other portable or removable memory, to input and output data and code to and from the computer system 700 of
Input devices 760 provide a portion of a user interface. Input devices 760 may include an alpha-numeric keypad, such as a keyboard, for inputting alpha-numeric and other information, a pointing device such as a mouse, a trackball, stylus, cursor direction keys, microphone, touch-screen, accelerometer, and other input devices. Additionally, the system 700 as shown in
Display system 770 may include a liquid crystal display (LCD) or other suitable display device. Display system 770 receives textual and graphical information and processes the information for output to the display device. Display system 770 may also receive input as a touch-screen.
Peripherals 780 may include any type of computer support device to add additional functionality to the computer system. For example, peripheral device(s) 780 may include a modem or a router, printer, and other device.
The system of 700 may also include, in some implementations, antennas, radio transmitters and radio receivers 790. The antennas and radios may be implemented in devices such as smart phones, tablets, and other devices that may communicate wirelessly. The one or more antennas may operate at one or more radio frequencies suitable to send and receive data over cellular networks, Wi-Fi networks, commercial device networks such as a Bluetooth device, and other radio frequency networks. The devices may include one or more radio transmitters and receivers for processing signals sent and received using the antennas.
The components contained in the computer system 700 of
The foregoing detailed description of the technology herein has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the technology to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. The described embodiments were chosen to best explain the principles of the technology and its practical application to thereby enable others skilled in the art to best utilize the technology in various embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the technology be defined by the claims appended hereto.
Claims
1. A method for forecasting values for system metric time series, comprising:
- receiving raw time series metric data by an application on an application server, the raw time series metric data associated with an API provided by a remote server, the raw time metric data associated with request and responses to the API from a plurality of remote devices in communication with the remote server;
- transforming, by the application on the application server, the received raw time series metric data into smoothed time series metric data;
- passing the smoothed time series metric data through a prediction model, the prediction model trained using training data prior to receiving smoothed time series metric data;
- passing one or more homogeneous variables associated with the API through the prediction model;
- generating a prediction for the time series metric data in a smoothed format, the prediction based on the smoothed time series metric data and the one or more homogeneous variables associated with the API;
- transforming the smoothed predicted time series metric data into a raw predicted time series metric value; and
- analyzing the raw predicted time series metric value to determine whether raw time series metric data is anomalous.
2. The method of claim 1, wherein the raw time series metric data is associated with a plurality of APIs.
3. The method of claim 2, wherein the determination whether the raw time series metric data is anomalous is made based on the predicted value for each of a majority of the plurality of APIs.
4. The method of claim 1, further comprising:
- receiving a pilot signal to identify a particular API of the plurality of APIs; and
- generating a prediction of the pilot signal based on the output of the prediction model.
5. The method of claim 1, further comprising:
- passing the output of the prediction model through a neural network, the generated prediction for the time series metric data and the generated prediction for the pilot signal being generated by the neural network.
6. The method of claim 1, wherein transforming includes using a moving average of the metric time series.
7. The method of claim 1, wherein the prediction model includes a long short-term memory model.
8. A non-transitory computer readable storage medium having embodied thereon a program, the program being executable by a processor to perform a method for automatically forecasting values for system metric time series, the method comprising:
- receiving raw time series metric data by an application on an application server, the raw time series metric data associated with an API provided by a remote server, the raw time metric data associated with request and responses to the API from a plurality of remote devices in communication with the remote server;
- transforming, by the application on the application server, the received raw time series metric data into smoothed time series metric data;
- passing the smoothed time series metric data through a prediction model, the prediction model trained using training data prior to receiving smoothed time series metric data;
- passing one or more homogeneous variables associated with the API through the prediction model;
- generating a prediction for the time series metric data in a smoothed format, the prediction based on the smoothed time series metric data and the one or more homogeneous variables associated with the API;
- transforming the smoothed predicted time series metric data into a raw predicted time series metric value; and
- analyzing the raw predicted time series metric value to determine whether raw time series metric data is anomalous.
9. The non-transitory computer readable storage medium of claim 8, wherein the raw time series metric data is associated with a plurality of APIs.
10. The non-transitory computer readable storage medium of claim 9, wherein the determination whether the raw time series metric data is anomalous is made based on the predicted value for each of a majority of the plurality of APIs.
11. The non-transitory computer readable storage medium of claim 8, the method further comprising:
- receiving a pilot signal to identify a particular API of the plurality of APIs; and
- generating a prediction of the pilot signal based on the output of the prediction model.
12. The non-transitory computer readable storage medium of clam 8, the method further comprising:
- passing the output of the prediction model through a neural network, the generated prediction for the time series metric data and the generated prediction for the pilot signal being generated by the neural network.
13. The non-transitory computer readable storage medium of claim 8, wherein transforming includes using a moving average of the metric time series.
14. The non-transitory computer readable storage medium of claim 8, wherein the prediction model includes a long short-term memory model.
15. A system for automatically forecasting values for system metric time series, comprising:
- a server including a memory and a processor; and
- one or more modules stored in the memory and executed by the processor to receive raw time series metric data by an application on an application server, the raw time series metric data associated with an API provided by a remote server, the raw time metric data associated with request and responses to the API from a plurality of remote devices in communication with the remote server, transform, by the application on the application server, the received raw time series metric data into smoothed time series metric data, pass the smoothed time series metric data through a prediction model, the prediction model trained using training data prior to receiving smoothed time series metric data, pass one or more homogeneous variables associated with the API through the prediction model, generate a prediction for the time series metric data in a smoothed format, the prediction based on the smoothed time series metric data and the one or more homogeneous variables associated with the API, transform the smoothed predicted time series metric data into a raw predicted time series metric value, and analyze the raw predicted time series metric value to determine whether raw time series metric data is anomalous.
16. The system of claim 15, wherein the raw time series metric data is associated with a plurality of APIs.
17. The system of claim 16, wherein the determination whether the raw time series metric data is anomalous is made based on the predicted value for each of a majority of the plurality of APIs.
18. The system of claim 15, the modules further executable to receive a pilot signal to identify a particular API of the plurality of APIs, and generate a prediction of the pilot signal based on the output of the prediction model.
19. The system of claim 15, the modules further executable to pass the output of the prediction model through a neural network, the generated prediction for the time series metric data and the generated prediction for the pilot signal being generated by the neural network.
20. The system of claim 15, wherein transforming includes using a moving average of the metric time series.
Type: Application
Filed: Jun 16, 2021
Publication Date: Oct 6, 2022
Applicant: Traceable Inc. (San Francisco, CA)
Inventors: Ravindra Guntar (Hyderabad), Ranaji Krishna (Berkeley, CA)
Application Number: 17/348,785