Abstract: Methods, systems, apparatus, and circuits for dynamically optimizing the circuit for forward and backward propagation phases of training for neural networks, given a fixed resource budget. The circuits comprising: (1) a specialized circuit that can operate on a plurality of multi-dimensional inputs and weights for the forward propagations phase of neural networks; and (2) a specialized circuit that can operate on either gradients and inputs, or gradients and weights for the backward propagation phase of neural networks.
Abstract: For one embodiment of the present invention, methods and systems for accelerating data operations with efficient memory management in native code and native dynamic class loading mechanisms are disclosed. In one embodiment, a data processing system comprises memory and a processing unit coupled to the memory. The processing unit is configured to receive input data, to execute a domain specific language (DSL) for a DSL operation with a native implementation, to translate a user defined function (UDF) into the native implementation by translating user defined managed software code into native software code, to execute the native software code in the native implementation, and to utilize a native memory management mechanism for the memory to manage object instances in the native implementation.
Type:
Grant
Filed:
December 3, 2019
Date of Patent:
December 7, 2021
Assignee:
BIGSTREAM SOLUTIONS, INC.
Inventors:
Weiwei Chen, Behnam Robatmili, Maysam Lavasani, John David Davis
Abstract: A data processing system is disclosed that includes machines having an in-line accelerator and a general purpose instruction-based general purpose instruction-based processor. In one example, a machine comprises storage to store data and an Input/output (I/O) processing unit coupled to the storage. The I/O processing unit includes an in-line accelerator that is configured for in-line stream processing of distributed multi stage dataflow based computations. For a first stage of operations, the in-line accelerator is configured to read data from the storage, to perform computations on the data, and to shuffle a result of the computations to generate a first set of shuffled data. The in-line accelerator performs the first stage of operations with buffer less computations.