DISTRIBUTED MODEL EXECUTION

Info

Publication number: 20210255886
Type: Application
Filed: Feb 16, 2021
Publication Date: Aug 19, 2021
Inventors: EUGENE VON NIEDERHAUSERN (ROSENBERG, TX), SREENIVASA GORTI (AUSTIN, TX), KEVIN W. DIVINCENZO (PFLUGERVILLE, TX), SRIDHAR SUDARSAN (AUSTIN, TX)
Application Number: 17/176,906

Abstract

Distributed model execution, including: identifying, for each model of a plurality of models, based on one or more execution constraints for the plurality of models, a corresponding node of a plurality of nodes, wherein the plurality of nodes each comprise one or more computing devices or one or more virtual machines; deploying each model of the plurality of models to the identified corresponding node of the plurality of nodes; and wherein the plurality of models are configured to generate, based on data input to at least one model of the plurality of models, a prediction associated with the data.

Description

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a non-provisional application for patent entitled to a filing date and claiming the benefit of earlier-filed U.S. Provisional Patent Application Ser. No. 62/976,965, filed Feb. 14, 2020.

This application is related to co-pending U.S. patent application docket Ser. No. SC0010US01, filed Feb. 16, 2021, and co-pending U.S. patent application docket Ser. No. SC0011US01, filed Feb. 16, 2021, each of which is incorporated by reference in their entirety.

BACKGROUND

Machine learning models may be used to perform various data analysis applications. A client or consumer may not have the hardware or software resources available on-premises to perform computationally intensive predictions or handle large amounts of data.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of an example system for distributed model execution according to some embodiments.

FIG. 2 is a diagram of model dependencies for distributed model execution according to some embodiments.

FIG. 3 is a block diagram of an example execution environment for distributed model execution according to some embodiments.

FIG. 4 is a flowchart of another example method for distributed model execution according to some embodiments.

FIG. 5 is a flowchart of another example method for distributed model execution according to some embodiments.

FIG. 6 is a flowchart of another example method for distributed model execution according to some embodiments.

FIG. 7 is a flowchart of another example method for distributed model execution according to some embodiments.

DETAILED DESCRIPTION

Machine learning models may be used to perform various data analysis applications. For example, one or more machine learning models may be used to generate predictions or other analysis based on input data. Machine learning models may be logically integrated such that the output of some models are provided as input to other models, ultimately resulting in a model providing an output as the prediction.

A client or consumer may not have the hardware or software resources available on-premises to perform computationally intensive predictions or handle large amounts of data. To address these shortcomings, a client may provide the machine learning models used to generate a prediction to off-site or remote resources, such as remote data centers, cloud computing environments, and the like. These remote resources may have access to hardware such as Graphics Processing Units (GPUs), Field-Programmable Gate Arrays (FPGAs), or other devices that the models may leverage to accelerate their performance. The resulting prediction or output may then be provided back to a client.

As will be described in more detail below, the execution of a given model may be performed by a node. Such a node may include a computing device, a virtual machine, or other device as can be appreciated. The models may be deployed for execution to a given node based on various criteria, including the hardware and software resources available to a node, the type of data or calculations used by the model, authorization requirements, model dependencies, and the like. Once deployed, the models may be used for distributed processing of data in order to generate a prediction for a client.

FIG. 1 is a block diagram of a non-limiting example system for distributed model execution. The example system includes a model execution environment 106. The model execution environment 106 includes a plurality of nodes 108a-n. Each node 108a-n is an allocation of hardware and software resources, including storage resources (e.g., storage devices, memory, and the like), processing resources (e.g., processors, hardware accelerators such as GPUs, FPGAs, and the like), software resources (e.g., operating systems, software applications, and the like), and other resources as can be appreciated to facilitate distributed model execution. Each node 108a-n may include one or more computing devices, one or more virtual machines, or other allocations of resources as can be appreciated. Each node 108a-n may be communicatively coupled to another node 108a-n using various communications resources, including buses, wired or wireless networks, and the like.

The system of FIG. 1 also includes a management node 102. The management node 102 is similar to the nodes 108a-n in that the management node 102 may include a computing device, virtual machine, and the like. The management node 102 is communicatively coupled to the model execution environment 106. Although the management node 102 is shown as separate from the model execution environment 106, it is understood that the management node 102 may be located remote from or proximate to the model execution environment 106. For example, the management node 102 and the model execution environment 106 may be implemented in the same or separate data centers, cloud computing environments, and the like.

Also included in the system of FIG. 1 is a client device 112. The client device 112 provides, to the model execution environment 106, a plurality of models 110a-n for execution in the plurality of nodes 108a-n. Although FIG. 1 shows each model 110a-n allocated to and executed in a respective node 108a-n, it is understood that other configurations and allocations of nodes 108a-n are possible. For example, a node 108a-n may be allocated execution of multiple models 110a-n. As another example, multiple nodes 108a-n may operate in parallel to facilitate the execution of a single model 110a-n. In some examples, executing a model at multiple nodes includes assigning different portions of an input data set to the different nodes, where each node executes an entirety of model operations with respect to their respective assigned input data. As yet another example, executing a model at multiple nodes includes executing a first portion of a model (e.g., operations corresponding to one or more first neural network layers) at a first node and second portion of the model (e.g., operations corresponding to one or more second neural network layers) at a second node, where “intermediate” output from the first node is provided as input to the second node.

The plurality of models 110a-n may include machine learning models (e.g., trained machine learning models such as neural networks), algorithmic models, and the like each configured to provide some output based on some input data. In aggregate, the plurality of models 110a-n are configured to generate a prediction based on input to one or more of the models 110a-n. Such predictions may include, for example, classifications for a classification problem, a numerical value for a regression problem, and the like. The plurality of models 110a-n may also output one or more confidence values associated with the prediction. Accordingly, each model 110a-n is configured to receive, as input, data output by another model 110a-n, provide output as input data to another model 110a-n, or both.

Consider the example graph representations of model dependencies shown in FIG. 2. FIG. 2 shows an exemplary arrangement of models and their respective dependencies. One skilled in the art will appreciate that other arrangements or configurations of model dependencies are possible, and that FIG. 2 merely serves as an illustrative example. As shown in FIG. 2, a model 204a receives, as input data, input 202a. Input 202a may include stored data, data from a data stream, or data from another data source as can be appreciated. Model 204a provides output to models 204b and 204c. Model 204b receives input from models 204a and 204b. Model 204d receives input from model 204c and input 202b. Model 204d provides, as output data, output 206. In the example of FIG. 2, inputs 202a, b are provided from one or more data sources to models 204a and 204d, respectively. Data processing is performed through the various model dependencies in order to ultimately generate output 206. The output 206 may include a prediction based on the inputs 202a, b.

Turning back to FIG. 1, the management node 102 executes a management module 104 for distributed model execution. The management module 104 identifies, for each model 110a-n of a plurality of models 110a-n, based on one or more execution constraints for the plurality of models 110a-n, a corresponding node 108a-n of a plurality of nodes 108a-n. For example, assume that the management module 104 receives a request from the client device 112 to deploy a plurality of models 110a-n for deployment to the model execution environment 106. The request may include the plurality of models 110a-n. The request may also include identifiers, network addresses, or other data facilitating access to the models 110a-n. For example, after uploading the plurality of models 110a-n to the model execution environment 106 or another storage location, the request may identify the plurality of models 110a-n for deployment to the model execution environment 106a-n for execution. Accordingly, the management module 104 identifies each node 108a-n to which a model 110a-n will be deployed for execution.

The management module 104 identifies the nodes 108a-n for each model 110a-n based on one or more execution constraints. The one or more execution constraints for a given model 110a-n are requirements to be satisfied by a given node 108a-n in order to execute the given model 110a-n. The one or more execution constraints may include required constraints, where a node 108a-n must satisfy a particular constraint for a given model 110a-n to be deployed there. The one or more execution constraints may also include preferential constraints, where a node 108a-n is more preferentially selected for deployment of a given model 110a-n if the constraint is satisfied.

The one or more execution constraints may include one or more model dependencies. For example, turning back to the example of FIG. 2, the model 204b is dependent on the output of the model 204a as the model 204b accepts the output of the model 204a as its input. Similarly, the model 204c is dependent on models 204a and 204b. Accordingly, a node 108a-n selected for deploying the model 204b must have a communications pathway to (or be the same node as) nodes 108a-n to which the models 204a and 204c are deployed. Moreover, in some embodiments, the nodes 108a-n are selected to reduce or minimize latency between nodes 108a-n having interdependent models 110a-n.

The one or more execution constraints may also include one or more encryption constraints. The one or more encryption constraints may indicate data input to or received from a given model 110a-n must be encrypted if transferred over a network. The one or more encryption constraints may also indicate that data input to or received from a given model 110a-n must be encrypted regardless if transferred over a network (e.g., if the source and destination models 110a-n are executed in a same node 108a-n, or executed within different virtual machine nodes 108a-n implemented in a same hardware environment). The one or more encryption constraints may indicate a type of encryption to be used (e.g., symmetric vs. asymmetric, particular algorithms, and the like). Accordingly, a node 108a-n may be selected based on an encryption constraint by selecting a node 108a-n having hardware accelerators, processors, or other resources to facilitate satisfaction of the particular encryption constraints. For example, a model 110a-n whose output must be encrypted may be preferentially deployed to a node 108a-n having greater hardware or processing resources, while a model 110a-n that needs to neither encrypt output or decrypt input may be preferentially deployed to a node 108a-n having lesser hardware or processing resources. As a further example, a model 110a-n whose input must be decrypted and whose output must be encrypted may be preferentially deployed to a node 108a-n having even greater hardware or processing resources.

The one or more execution constraints may also include one or more authorization constraints. An authorization constraint is a restriction on which entities have access to data input to a model 110a-n, output by a model 110a-n, generated by the model 110a-n (e.g., intermediary data or calculations), and the like. For example, an authorization constraint may indicate that a model 110a-n should be executed on a private node 108a-n (e.g., a node 108a-n not shared by or accessible to another tenant or client of the model execution environment 106). As a further example, an authorization constraint may define access privileges for those users or other entities that may access the node 108a-n executing a given model 110a-n. As another example, an authorization constraint may indicate that the input to or output from a given model 110a-n should be transferred only over a private network. Accordingly, the node 108a-n for the given model 110a-n should be selected as having access to a private network connection to nodes 108a-n executing its dependent models 110a-n.

The management module 104 may also identify the nodes 108a-n for each model 110a-n based on one or more node characteristics. Node characteristics for a given node 108a-n may include hardware resources for the node 108a-n. Such hardware resources may include storage devices, memory (e.g., random access memory (RAM)), processors, hardware accelerators, network interfaces, and the like. Software resources may include particular operating systems, software libraries, applications, and the like. For example, a model 110a-n processing highly-dimensional data or large amounts of data at a time may be preferentially deployed to a node 108a-n having more RAM than another node 108a-n. As another example, a model 110a-n that uses a particular encryption algorithm for encrypting output data or decrypting input data may be preferentially deployed to a node 108a-n having the requisite libraries for performing the algorithm installed.

The management module 104 may also identify the nodes 108a-n for each model 110a-n based on one or more model characteristics. The model characteristics for a given model 110a-n describe the data acted upon and the calculations performed by the model 110a-n. For example, model characteristics may include a data type for data input to the model 110a-n. A data type for input data may describe a type of value included in the input data (e.g., integer, floating point, bytes, and the like). A data type for input data may also describe a data structure or class of the input data (e.g., single values, multidimensional data structures, labeled or unlabeled data, time series data, and the like). Model characteristics may also include types of calculations or transformations performed by the model (e.g., arithmetic calculations, floating point operations, matrix operations, Boolean operations, and the like). For example, models 110a-n performing complex matrix operations on multidimensional floating point data may be preferentially deployed to nodes 108a-n with GPUs, FPGAs, or other hardware accelerators to facilitate execution of such operations. As another example, a neural network model may be deployed to different node(s) based on architectural parameters, such as whether the neural network is a feed-forward network or a recurrent network, whether the neural network exhibits “memory” (e.g., via long short-term memory (LSTM) architecture), etc.

Identifying, for each model 110a-n of a plurality of models 110a-n, based on one or more execution constraints for the plurality of models 110a-n, a corresponding node 108a-n of a plurality of nodes 108a-n may include calculating, for each model 110a-n of the plurality of models 110a-n, a plurality of fitness scores for each of the plurality of nodes 108a-n. In other words, a given model 110a-n has a fitness score calculated for each of the nodes 108a-n indicating a fitness of that node 108a-n for the given model 110a-n. Each fitness score for a given model may be calculated based on a degree to which the node 108a-n satisfies the execution constraints for the model 110a-n.

A node 108a-n may receive a higher fitness score for satisfying an execution constraint to a greater degree than another node 108a-n. For example, assume that a first model 110a-n is dependent on a second model 110a-n (e.g., for input by virtue of receiving input from the second model 110a-n, or output by virtue of providing output to the second model 110a-n), and that the first model 110a-n is selected for deployment to a first node 108a-n. Further assume that a second node 108a-n and a third node 108a-n are both communicatively coupled to the first node 108a-n, with the second node 108a-n having a lower latency connection to the first node 108a-n compared to a connection from the third node 108a-n to the first node 108a-n. Accordingly, the second model 110a-n would have a higher fitness score for the second node 108a-n than the third node 108a-n by virtue of the lower latency connection to the first node 108a-n to which the dependent first model 110a-n is to be deployed.

A node 108a-n may receive a null or zero fitness score for failing to satisfy a required execution constraint. For example, assume that a given model 110a-n must be executed on a private node 108a-n. Any nodes 108a-n accessible to other tenants may receive a zero fitness score for failing to meet the privacy requirement.

The fitness score may also be calculated based on the node characteristics of each node 108a-n or the model characteristics of the model 110a-n. For example, a model 110a-n performing calculations on highly-dimensional data may assign a higher fitness score to nodes 108a-n with greater RAM. As another example, nodes 108a-n having advanced processors or hardware accelerators may not receive higher fitness scores for models 110a-n acting on low-dimensional data or performing more simple arithmetic calculations as such hardware resources would not provide a meaningful benefit when compared to other models 110a-n.

The management module 104 may then select, for each model 110a-n, based on the plurality of fitness scores, the corresponding node 108a-n (e.g., the node 108a-n to which a given model 110a-n will be deployed). In some embodiments, selecting, based on the plurality of fitness scores, the corresponding node 108a-n includes selecting, for each model 110a-n a highest scoring node 108a-n. For example, a node 108a-n may be selected for each model 110a-n by traversing a listing or ordering of models 110a-n and selecting a node 108a-n for a currently selected model 110a-n. In some embodiments, after a model 110a-n is assigned to a given node 108a-n, the fitness scores for the given node 108a-n may be recalculated for each model 110a-n not having an assigned node 108a-n. Accordingly, in some embodiments, a node 108a-n already having an assigned model 110a-n may still be an optimal selection for deploying another model 110a-n. In other embodiments, selecting, based on the plurality of fitness scores, the corresponding node 108a-n includes generating multiple combinations or permutations of assigning models 110a-n for deployment to nodes 108a-n and calculating a best fit assignment for all of the plurality of models 110a-n (e.g., an assignment with a highest total fitness score across all models 110a-n).

The management module 104 then deploys each model 110a-n of the plurality of models 110a-n to the identified corresponding node 108a-n of the plurality of nodes 108a-n. Deploying each model 110a-n may include sending one or more of the models 110a-n to their respective assigned node 108a-n. Deploying each model 110a-n may also include causing a node 108a-n to acquire or load its assigned model 110a-n. For example, the management module 104 may issue a command for a given node 108a-n to load its assigned model 110a-n from a local or remote storage location.

Deploying each model 110a-n may also include configuring one or more models 110a-n to receive input from one or more data sources (e.g., data sources other than another model 110a-n). For example, the management module 104 may provide a node 108a-n of a given model 110a-n network addresses (e.g., Uniform Resource Locators (URLs), Internet Protocol (IP) addresses) or other identifiers for data sources of input data to the given model 110a-n. For example, the management module 104 may provide a node 108a-n a URL or IP address for a data stream of data to be provided as input to the given model 110a-n. As another example, the management module 104 may provide a node a URL, IP address, memory address, or file path to stored data to be provided as input to the given model 110a-n. The management module 104 may also provide authentication credentials, login credentials, or other data facilitating access to stored data or data streams.

Deploying each model 110a-n may also include configuring one or more models 110a-n to provide, as output, a prediction generated by the plurality of models 110a-n. For example, the management module 104 may indicate a storage location or file path for output data. The management module 104 may further provide an indication of the storage location of the output data to the client device 112.

In some embodiments, deploying each model 110a-n includes configuring each node 108a-n to communicate with at least one other node 108a-n of the plurality of nodes 108a-n. Thus, each interdependent model 110a-n may communicate with each other via the configured nodes 108a-n. For example, the management module 104 may facilitate the exchange of encryption keys between nodes 108a-n executing dependent nodes 108a-n requiring encryption. As another example, the management module 104 may provide, to nodes 108a-n executing models 110a-n having dependent models 110a-n, the URLs, IP addresses, or other identifiers of the nodes 108a-n executing their respective dependent models 110a-n. In some embodiments, the management module 104 may allocate or generate communications pathways between nodes 108a-n via network communications fabrics of the model execution environment 106. In some embodiments, the management module 104 may configure Application Program Interface (API) calls or queries facilitating communications between any of the nodes 108a-n.

A prediction may then be generated by the deployed models 110a-n. For example, input data may be provided to one or more of the models 110a-n. A prediction may then be generated as an output of a model 110a-n by virtue of the distributed and interdependent execution of the models 110a-n in the nodes 108a-n. Data indicating the prediction may then be provided or made accessible to the client device 112.

By deploying the models 110a-n to the nodes 108a-n of the model execution environment 106 as described above, the management module 104 ensures a useable configuration of models 110a-n as deployed to nodes 108a-n. Moreover, the management module 104 ensures that the model 110a-n deployment preserves the hierarchy of dependencies of models 110a-n, as well as the encryption and authorization requirements of the models 110a-n.

For further explanation, FIG. 3 sets forth a diagram of an execution environment 300 in accordance with some embodiments of the present disclosure. The execution environment 300 depicted in FIG. 3 may be embodied in a variety of different ways. The execution environment 300 may be provided, for example, by one or more cloud computing providers such as Amazon Web Services (AWS), Microsoft Azure, Google Cloud, and others, including combinations thereof. Alternatively, the execution environment 300 may be embodied as a collection of devices (e.g., servers, storage devices, networking devices) and software resources that are included in a private data center. In fact, the execution environment 300 may be embodied as a combination of cloud resources and private resources that collectively form a hybrid cloud computing environment.

The execution environment 300 depicted in FIG. 3 may include storage resources 302, which may be embodied in many forms. For example, the storage resources 302 may include flash memory, hard disk drives, nano-RAM, non-volatile memory (NVM), 3D crosspoint non-volatile memory, magnetic random access memory (MRAM), non-volatile phase-change memory (PCM), storage class memory (SCM), or many others, including combinations of the storage technologies described above. Readers will appreciate that other forms of computer memories and storage devices may be utilized as part of the execution environment 300, including DRAM, static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), universal memory, and many others. The storage resources 302 may also be embodied, in embodiments where the execution environment 300 includes resources offered by a cloud provider, as cloud storage resources such as Amazon Elastic Block Storage (EBS) block storage, Amazon S3 object storage, Amazon Elastic File System (EFS) file storage, Azure Blob Storage, and many others. The example execution environment 300 depicted in FIG. 3 may implement a variety of storage architectures, such as block storage where data is stored in blocks, and each block essentially acts as an individual hard drive, object storage where data is managed as objects, or file storage in which data is stored in a hierarchical structure. Such data may be saved in files and folders, and presented to both the system storing it and the system retrieving it in the same format.

The execution environment 300 depicted in FIG. 3 also includes communications resources 304 that may be useful in facilitating data communications between components within the execution environment 300, as well as data communications between the execution environment 300 and computing devices that are outside of the execution environment 300. Such communications resources may be embodied, for example, as one or more routers, network switches, communications adapters, and many others, including combinations of such devices. The communications resources 304 may be configured to utilize a variety of different protocols and data communication fabrics to facilitate data communications. For example, the communications resources 304 may utilize Internet Protocol (IP) based technologies, fibre channel (FC) technologies, FC over ethernet (FCoE) technologies, InfiniBand (IB) technologies, NVM Express (NVMe) technologies and NVMe over fabrics (NVMeoF) technologies, and many others. The communications resources 304 may also be embodied, in embodiments where the execution environment 300 includes resources offered by a cloud provider, as networking tools and resources that enable secure connections to the cloud as well as tools and resources (e.g., network interfaces, routing tables, gateways) to configure networking resources in a virtual private cloud. Such communications resources may be useful in facilitating data communications between components within the execution environment 300, as well as data communications between the execution environment 300 and computing devices that are outside of the execution environment 300.

The execution environment 300 depicted in FIG. 3 also includes processing resources 306 that may be useful in useful in executing computer program instructions and performing other computational tasks within the execution environment 300. The processing resources 306 may include one or more application-specific integrated circuits (ASICs) that are customized for some particular purpose, one or more central processing units (CPUs), one or more digital signal processors (DSPs), one or more field-programmable gate arrays (FPGAs), one or more systems on a chip (SoCs), or other form of processing resources 306. The processing resources 306 may also be embodied, in embodiments where the execution environment 300 includes resources offered by a cloud provider, as cloud computing resources such as one or more Amazon Elastic Compute Cloud (EC2) instances, event-driven compute resources such as AWS Lambdas, Azure Virtual Machines, or many others.

The execution environment 300 depicted in FIG. 3 also includes software resources 308 that, when executed by processing resources 306 within the execution environment 300, may perform various tasks. The software resources 308 may include, for example, one or more modules of computer program instructions that when executed by processing resources 306 within the execution environment 300 are useful for distributed model execution. The software resources may include one or more models 310 (e.g., models 110a-n as executed in nodes 108a-n of FIG. 1). The software resources may also include a management module 312 (e.g., a management module 104 as described in FIG. 1). Accordingly, the execution environment 300 may include one or more of a management node 102 or a management execution environment 106 as described in FIG. 1.

For further explanation, FIG. 4 sets forth a flow chart illustrating an example method for distributed model execution that includes identifying 402, for each model 110a-n of a plurality of models 110a-n, based on one or more execution constraints for the plurality of models 110a-n, a corresponding node 108a-n of a plurality of nodes 108a-n. For example, assume that the management module 104 receives a request from the client device 112 to deploy a plurality of models 110a-n for deployment to the model execution environment 106. The request may include the plurality of models 110a-n. The request may also include identifiers, network addresses, or other data facilitating access to the models 110a-n. For example, after uploading the plurality of models 110a-n to the model execution environment 106 or another storage location, the request may identify the plurality of models 110a-n for deployment to the model execution environment 106a-n for execution. Accordingly, the management module 104 identifies each node 108a-n to which a model 110a-n will be deployed for execution.

The nodes 108a-n for each model 110a-n are identified based on one or more execution constraints. The one or more execution constraints for a given model 110a-n are requirements to be satisfied by a given node 108a-n in order to execute the given model 110a-n. The one or more execution constraints may include required constraints, where a node 108a-n must satisfy a particular constraint for a given model 110a-n to be deployed there. The one or more execution constraints may also include preferential constraints, where a node 108a-n is more preferentially selected for deployment of a given model 110a-n if the constraint is satisfied.

The one or more execution constraints may include one or more model dependencies. For example, turning back to the example of FIG. 2, the model 204b is dependent on the output of the model 204a as the model 204b accepts the output of the model 204a as its input. Similarly, the model 204c is dependent on models 204a and 204b. Accordingly, a node 108a-n selected for deploying the model 204b must have a communications pathway to nodes 108a-n to which the models 204a and 204c are deployed. Moreover, in some embodiments, the nodes 108a-n are selected to reduce or minimize latency between nodes 108a-n having interdependent models 110a-n.

The one or more execution constraints may also include one or more encryption constraints. The one or more encryption constraints may indicate data input to or received from a given model 110a-n must be encrypted if transferred over a network. The one or more encryption constraints may also indicate that data input to or received from a given model 110a-n must be encrypted regardless if transferred over a network (e.g., if the source and destination models 110a-n are executed in a same node 108a-n, or executed within different virtual machine nodes 108a-n implemented in a same hardware environment). The one or more encryption constraints may indicate a type of encryption to be used (e.g., symmetric vs. asymmetric, particular algorithms, and the like). Accordingly. A node 108a-n may be selected based on an encryption constraint by selecting a node 108a-n having hardware accelerators, processors, or other resources to facilitate satisfaction of the particular encryption constraints. For example, a model 110a-n whose output must be encrypted may be preferentially deployed to a node 108a-n having greater hardware or processing resources, while a model 110a-n who needs to neither encrypt output or decrypt input may be preferentially deployed to a node 108a-n having lesser hardware or processing resources. As a further example, a model 110a-n whose input must be decrypted and whose output must be encrypted may be preferentially deployed to a node 108a-n having even greater hardware or processing resources.

The one or more execution constraints may also include one or more authorization constraints. An authorization constraint is a restriction on which entities have access to data input to a model 110a-n, output by a model 110a-n, generated by the model 110a-n (e.g., intermediary data or calculations), and the like. For example, an authorization constraint may indicate that a model 110a-n should be executed on a private node 108a-n (e.g., a node 108a-n not shared by or accessible to another tenant or client of the model execution environment 106). As a further example, an authorization constraint may define access privileges for those users or other entities that may access the node 108a-n executing a given model 110a-n. As another example, an authorization constrain may indicate that the input to or output from a given model 110a-n should be transferred only over a private network. Accordingly, the node 108a-n for the given model 110a-n should be selected as having access to a private network connection to nodes 108a-n executing its dependent models 110a-n.

The management module 104 may also identify the nodes 108a-n for each model 110a-n based on one or more node characteristics. Node characteristics for a given node 108a-n may include hardware resources for the node 108a-n. Such hardware resources may include storage devices, memory (e.g., random access memory (RAM)), processors, hardware accelerators, network interfaces, and the like. Software resources may include particular operating systems, software libraries, applications, and the like. For example, a model 110a-n processing highly-dimensional data or large amounts of data at a time may be preferentially deployed to a node 108a-n having more RAM than another node 108a-n. As another example, a model 110a-n that uses a particular encryption algorithm for encrypting output data or decrypting input data may be preferentially deployed to a node 108a-n having the requisite libraries for performing the algorithm installed.

The management module 104 may also identify the nodes 108a-n for each model 110a-n based on one or more model characteristics. The model characteristics for a given model 110a-n describe the data acted upon and the calculations performed by the model 110a-n. For example, model characteristics may include a data type for data input to the model 110a-n. A data type for input data may describe a type of value included in the input data (e.g., integer, floating point, bytes, and the like). A data type for input data may also describe a data structure or class of the input data (e.g., single values, multidimensional data structures, and the like). Model characteristics may also include types of calculations or transformations performed by the model (e.g., arithmetic calculations, floating point operations, matrix operations, Boolean operations, and the like). For example, models 110a-n performing complex matrix operations on multidimensional floating point data may be preferentially deployed to nodes 108a-n with GPUs, FPGAs, or other hardware accelerators to facilitate execution of such operations.

The method of FIG. 4 also includes deploying 404 each model 110a-n of the plurality of models 110a-n to the identified corresponding node 108a-n of the plurality of nodes 108a-n. Deploying 404 each model 110a-n may include sending one or more of the models 110a-n to their respective assigned node 108a-n. Deploying 404 each model 110a-n may also include causing a node 108a-n to acquire or load its assigned model 110a-n. For example, the management module 104 may issue a command for a given node 108a-n to load its assigned model 110a-n from a local or remote storage location.

Deploying 404 each model 110a-n may also include configuring one or more models 110a-n to receive input from one or more data sources (e.g., data sources other than another model 110a-n). For example, the management module 104 may provide a node 108a-n of a given model 110a-n network addresses (e.g., Uniform Resource Locators (URLs), Internet Protocol (IP) addresses) or other identifiers for data sources of input data to the given model 110a-n. For example, the management module 104 may provide a node 108a-n a URL or IP address for a data stream of data to be provided as input to the given model 110a-n. As another example, the management module 104 may provide a node a URL, IP address, memory address, or file path to stored data to be provided as input to the given model 110a-n. The management module 104 may also provide authentication credentials, login credentials, or other data facilitating access to stored data or data streams.

Deploying 404 each model 110a-n may also include configuring one or more models 110a-n to provide, as output, a prediction generated by the plurality of models 110a-n. For example, the management module 104 may indicate a storage location or file path for output data. The management module 104 may further provide an indication of the storage location of the output data to the client device 112.

One skilled in the art will appreciate that the approaches set forth above with respect to FIG. 4 may be performed repeatedly such that models 110a-n are redistributed or redeployed according to various circumstances. Such circumstances may include, for example, a user request, a predefined interval passing, an addition or removal of a node 108a-n or model 110a-n, a change in available computational resources in nodes 108a-n, and the like.

For further explanation, FIG. 5 sets forth a flow chart illustrating another example method for distributed model execution according to embodiments of the present disclosure. The method of FIG. 5 is similar to that of FIG. 4 in that the method of FIG. 5 also includes identifying 402, for each model 110a-n of a plurality of models 110a-n, based on one or more execution constraints for the plurality of models 110a-n, a corresponding node 108a-n of a plurality of nodes 108a-n; and deploying 404 each model 110a-n of the plurality of models 110a-n to the identified corresponding node 108a-n of the plurality of nodes 108a-n.

The method of 5 differs from FIG. 4 in that identifying 402, for each model 110a-n of a plurality of models 110a-n, based on one or more execution constraints for the plurality of models 110a-n, a corresponding node 108a-n of a plurality of nodes 108a-n includes calculating 502, for each model 110a-n of the plurality of models 110a-n, a plurality of fitness scores for each of the plurality of nodes 108a-n. In other words, a given model 110a-n has a fitness score calculated for each of the nodes 108a-n indicating a fitness of that node 108a-n for the given model 110a-n. Each fitness score for a given model may be calculated based on a degree to which the node 108a-n satisfies the execution constraints for the model 110a-n.

A node 108a-n may receive a higher fitness score for satisfying an execution constraint to a greater degree than another node 108a-n. For example, assume that a first model 110a-n is dependent on a second model 110a-n (e.g., for input or output), and that the first model 110a-n is selected for deployment to a first node 108a-n. Further assume that a second node 108a-n and a third node 108a-n are both communicatively coupled to the first node 108a-n, with the second node 108a-n having a lower latency connection to the first node 108a-n compared to a connection from the third node 108a-n to the first node 108a-n. Accordingly, the second model 110a-n would have a higher fitness score for the second node 108a-n than the third node 108a-n by virtue of the lower latency connection to the first node 108a-n to which the dependent first model 110a-n is to be deployed.

A node 108a-n may receive a null or zero fitness score for failing to satisfy a required execution constraint. For example, assume that a given model 110a-n must be executed on a private node 108a-n. Any nodes 108a-n accessible to other tenants may receive a zero fitness score for failing to meet the privacy requirement.

The fitness score may also be calculated based on the node characteristics of each node 108a-n or the model characteristics of the model 110a-n. For example, a model 110a-n performing calculations on highly-dimensional data may assign a higher fitness score to nodes 108a-n with greater RAM. As another example, nodes 108a-n having advanced processors or hardware accelerators may not receive higher fitness scores for models 110a-n acting on low-dimensional data or performing more simple arithmetic calculations as such hardware resources would not provide a meaningful benefit when compared to other models 110a-n.

Identifying 402, for each model 110a-n of a plurality of models 110a-n, based on one or more execution constraints for the plurality of models 110a-n, a corresponding node 108a-n of a plurality of nodes 108a-n also includes selecting 504, for each model 110a-n, based on the plurality of fitness scores, the corresponding node 108a-n (e.g., the node 108a-n to which a given model 110a-n will be deployed). In some embodiments, selecting, based on the plurality of fitness scores, the corresponding node 108a-n includes selecting, for each model 110a-n a highest scoring node 108a-n. For example, a node 108a-n may be selected for each model 110a-n by traversing a listing or ordering of models 110a-n and selecting a node 108a-n for a currently selected model 110a-n. In some embodiments, after a model 110a-n is assigned to a given node 108a-n, the fitness scores for the given node 108a-n may be recalculated for each model 110a-n not having an assigned node 108a-n. Accordingly, in some embodiments, a node 108a-n already having an assigned model 110a-n may still be an optimal selection for deploying another model 110a-n. In other embodiments, selecting, based on the plurality of fitness scores, the corresponding node 108a-n includes generating multiple combinations or permutations of assigning models 110a-n for deployment to nodes 108a-n and calculating a best fit assignment for all of the plurality of models 110a-n (e.g., an assignment with a highest total fitness score across all models 110a-n).

For further explanation, FIG. 6 sets forth a flow chart illustrating another example method for distributed model execution according to embodiments of the present disclosure. The method of FIG. 6 is similar to that of FIG. 4 in that the method of FIG. 6 also includes identifying 402, for each model 110a-n of a plurality of models 110a-n, based on one or more execution constraints for the plurality of models 110a-n, a corresponding node 108a-n of a plurality of nodes 108a-n; and deploying 404 each model 110a-n of the plurality of models 110a-n to the identified corresponding node 108a-n of the plurality of nodes 108a-n.

The method of 6 differs from FIG. 4 in that deploying 404 each model 110a-n of the plurality of models 110a-n to the identified corresponding node 108a-n of the plurality of nodes 108a includes configuring 602 each node 108a-n to communicate with at least one other node 108a-n of the plurality of nodes 108a-n. Thus, each interdependent model 110a-n may communicate with each other via the configured nodes 108a-n. For example, the management module 104 may facilitate the exchange of encryption keys between nodes 108a-n executing dependent nodes 108a-n requiring encryption. As another example, the management module 104 may provide, to nodes 108a-n executing models 110a-n having dependent models 110a-n, the URLs, IP addresses, or other identifiers of the nodes 108a-n executing their respective dependent models 110a-n. In some embodiments, the management module 104 may allocate or generate communications pathways between nodes 108a-n via network communications fabrics of the model execution environment 106. In some embodiments, the management module 104 may configure Application Program Interface (API) calls or queries facilitating communications between any of the nodes 108a-n.

For further explanation, FIG. 7 sets forth a flow chart illustrating another example method for distributed model execution according to embodiments of the present disclosure. The method of FIG. 7 is similar to that of FIG. 4 in that the method of FIG. 7 also includes identifying 402, for each model 110a-n of a plurality of models 110a-n, based on one or more execution constraints for the plurality of models 110a-n, a corresponding node 108a-n of a plurality of nodes 108a-n; and deploying 404 each model 110a-n of the plurality of models 110a-n to the identified corresponding node 108a-n of the plurality of nodes 108a-n.

The method of 7 differs from FIG. 4 in that the method of FIG. 7 includes generating 702 a prediction based on a distributed execution of the plurality of models 110a-n. The execution of the plurality of models 110a-n is considered a distributed execution in that the models 110a-n are executed across a plurality of distributed nodes 108a-n. The plurality of models 110a-n are executed interdependently in that each node 108a-n provides output to or receives input from at least one other node 108a-n. The prediction may be generated based on input data provided to one or more of the plurality of models 110a-n. The prediction may be indicated in, encoded in, or embodied as output from one or more of the plurality of models 110a-n.

In view of the explanations set forth above, readers will recognize that the benefits of distributed model execution include:

- Improved performance of a computing system by identifying optimal or best fitting nodes for model deployment and execution.
- Improved performance of a computing system by allowing for remote, distributed execution of models, leveraging mode advanced hardware and computational resources than found in client systems.
- Improved performance of a computing system by deploying models such that model dependencies, encryption relationships, and authorization requirements are preserved.

Exemplary embodiments of the present disclosure are described largely in the context of a fully functional computer system for distributed model execution. Readers of skill in the art will recognize, however, that the present disclosure also can be embodied in a computer program product disposed upon computer readable storage media for use with any suitable data processing system. Such computer readable storage media can be any storage medium for machine-readable information, including magnetic media, optical media, or other suitable media. Examples of such media include magnetic disks in hard drives or diskettes, compact disks for optical drives, magnetic tape, and others as will occur to those of skill in the art. Persons skilled in the art will immediately recognize that any computer system having suitable programming means will be capable of executing the steps of the method of the disclosure as embodied in a computer program product. Persons skilled in the art will recognize also that, although some of the exemplary embodiments described in this specification are oriented to software installed and executing on computer hardware, nevertheless, alternative embodiments implemented as firmware or as hardware are well within the scope of the present disclosure.

The present disclosure can be a system, a method, and/or a computer program product. The computer program product can include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.

The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium can be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM) or Flash memory, a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.

Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network can include copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.

Computer readable program instructions for carrying out operations of the present disclosure can be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions can execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer can be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection can be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) can execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.

Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

These computer readable program instructions can be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions can also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein includes an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams can represent a module, segment, or portion of instructions, which includes one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block can occur out of the order noted in the figures. For example, two blocks shown in succession can, in fact, be executed substantially concurrently, or the blocks can sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.

It will be understood from the foregoing description that modifications and changes can be made in various embodiments of the present disclosure. The descriptions in this specification are for purposes of illustration only and are not to be construed in a limiting sense. The scope of the present disclosure is limited only by the language of the following claims.

Claims

1. A method for distributed model execution, comprising:

identifying, for each model of a plurality of models, based on one or more execution constraints for the plurality of models, a corresponding node of a plurality of nodes, wherein the plurality of nodes each comprise one or more computing devices or one or more virtual machines;

deploying each model of the plurality of models to the corresponding node of the plurality of nodes; and

wherein the plurality of models are configured to generate, based on data input to at least one model of the plurality of models, a prediction associated with the data.

2. The method of claim 1, wherein the one or more execution constraints comprise one or more model dependencies, one or more encryption constraints, or one or more authorization constraints.

3. The method of claim 1, further comprising generating the prediction based on a distributed execution of the plurality of models.

4. The method of claim 1, wherein identifying the corresponding node of the plurality of nodes is based on one or more node characteristics of the plurality of nodes.

5. The method of claim 4, wherein the one or more node characteristics comprise one or more of: one or more hardware resources of one or more of the plurality of nodes, or one or more software resources of one or more of the plurality of nodes.

6. The method of claim 1, wherein identifying the corresponding node of the plurality of nodes is based on one or more model characteristics of the plurality of models.

7. The method of claim 6, wherein the one or more model characteristics comprise one or more of: a data type for input data to one or more of the plurality of models, or a calculation type performed by one or more of the plurality of models.

8. The method of claim 1, wherein identifying, for each model of the plurality of models, the corresponding node of the plurality of nodes comprises:

calculating, for each model of the plurality of models, a plurality of fitness scores for the plurality of nodes; and

selecting, for each model, based on the plurality of fitness scores, the corresponding node.

9. The method of claim 1, further comprising configuring each node to communicate with at least one other node of the plurality of nodes.

10. The method of claim 9, wherein configuring each node to communicate with at least one other node of the plurality of nodes comprises configuring each node to provide output to or receive input from at least one other node.

11. The method of claim 1, further comprising redeploying one or more models of the plurality of models.

12. An apparatus for distributed model execution, the apparatus configured to perform steps comprising:

identifying, for each model of a plurality of models, based on one or more execution constraints for the plurality of models, a corresponding node of a plurality of nodes,

wherein the plurality of nodes each comprise one or more computing devices or one or more virtual machines;

deploying each model of the plurality of models to the corresponding node of the plurality of nodes; and

wherein the plurality of models are configured to generate, based on data input to at least one model of the plurality of models, a prediction associated with the data.

13. The apparatus of claim 12, wherein the one or more execution constraints comprise one or more model dependencies, one or more encryption constraints, or one or more authorization constraints.

14. The apparatus of claim 12, wherein the steps further comprise generating the prediction based on a distributed execution of the plurality of models.

15. The apparatus of claim 12, wherein identifying the corresponding node of the plurality of nodes is based on one or more node characteristics of the plurality of nodes.

16. The apparatus of claim 15, wherein the one or more node characteristics comprise one or more of: one or more hardware resources of one or more of the plurality of nodes, or one or more software resources of one or more of the plurality of nodes.

17. The apparatus of claim 12, wherein identifying the corresponding node of the plurality of nodes is based on one or more model characteristics of the plurality of models.

18. The apparatus of claim 17, wherein the one or more model characteristics comprise one or more of: a data type for input data to one or more of the plurality of models, or a calculation type performed by one or more of the plurality of models.

19. The apparatus of claim 12, wherein identifying, for each model of the plurality of models, the corresponding node of the plurality of nodes comprises:

calculating, for each model of the plurality of models, a plurality of fitness scores for the plurality of nodes; and

selecting, for each model, based on the plurality of fitness scores, the corresponding node.

20. The apparatus of claim 12, wherein the steps further comprise configuring each node to communicate with at least one other node of the plurality of nodes.

21. The apparatus of claim 20, wherein configuring each node to communicate with at least one other node of the plurality of nodes comprises configuring each node to provide output to or receive input from at least one other node.

22. The apparatus of claim 12, wherein the steps further comprise redeploying one or more models of the plurality of models.