Abstract: In a data processing system running at least one application on a hardware platform that includes at least one processor and a plurality of coprocessors, at least one kernel dispatched by an application is intercepted by an intermediate software layer running logically between the application and the system software. Compute functions are determined within kernel(s), and data dependencies are determined among the compute functions. The compute functions are dispatched to selected ones of the coprocessors based at least in part on the determined data dependencies and kernel results are returned to the application that dispatched the respective kernel.
Type:
Application
Filed:
July 6, 2017
Publication date:
January 10, 2019
Applicant:
Bitfusion.io, Inc.
Inventors:
Mazhar MEMON, Subramanian RAMA, Maciej BAJKOWSKI
Abstract: In a data processing system running at least one application on a hardware platform that includes at least one processor and a plurality of coprocessors, at least one kernel dispatched by an application is intercepted by an intermediate software layer running logically between the application and the system software. Compute functions are determined within kernel(s), and data dependencies are determined among the compute functions. The compute functions are dispatched to selected ones of the coprocessors based at least in part on the determined data dependencies and kernel results are returned to the application that dispatched the respective kernel.
Type:
Grant
Filed:
July 6, 2017
Date of Patent:
January 14, 2020
Assignee:
Bitfusion.io, Inc.
Inventors:
Mazhar Memon, Subramanian Rama, Maciej Bajkowski
Abstract: Execution of multiple execution streams is scheduled on a plurality of coprocessors. A software layer located logically between applications and the coprocessors determines dependencies within the execution streams, each said dependency being a condition in one of the execution streams that must be satisfied in order for execution of at least one other of the execution streams to proceed on corresponding ones of the coprocessors. The dependencies are then represented in a data structure and an optimized execution schedule is determined for the execution streams according to the dependencies. Simultaneous execution of a plurality of the execution streams is then dynamically reordered according to the optimized execution schedule.
Abstract: At least one application of a client executes via system software on a hardware computing system that includes at least one CPU and at least one coprocessor. A virtualization layer establishes unified memory address space between the client and the hardware computing system, which also includes memory associated with the at least one coprocessor. The virtualization layer then synchronizes memory associated with the client and memory associated the at least one coprocessor. The virtualization layer may be installed and run in a non-privileged, user space, without modification of the application or of the system software running on the hardware computing system.
Abstract: An interface software layer is interposed between at least one application and a plurality of coprocessors. A data and command stream issued by the application(s) to an API of an intended one of the coprocessors is intercepted by the layer, which also acquires and stores the execution state information for the intended coprocessor at a coprocessor synchronization boundary. At least a portion of the intercepted data and command stream data is stored in a replay log associated with the intended coprocessor. The replay log associated with the intended coprocessor is then read out, along with the stored execution state information, and is submitted to and serviced by at least one different one of the coprocessors other than the intended coprocessor.
Type:
Grant
Filed:
March 7, 2017
Date of Patent:
April 16, 2019
Assignee:
Bitfusion.io, Inc.
Inventors:
Mazhar Memon, Subramanian Rama, Maciej Bajkowski
Abstract: An interface software layer is interposed between at least one application and a plurality of coprocessors. A data and command stream issued by the application(s) to an API of an intended one of the coprocessors is intercepted by the layer, which also acquires and stores the execution state information for the intended coprocessor at a coprocessor synchronization boundary. At least a portion of the intercepted data and command stream data is stored in a replay log associated with the intended coprocessor. The replay log associated with the intended coprocessor is then read out, along with the stored execution state information, and is submitted to and serviced by at least one different one of the coprocessors other than the intended coprocessor.
Type:
Application
Filed:
March 7, 2017
Publication date:
October 12, 2017
Applicant:
Bitfusion.io, Inc.
Inventors:
Mazhar MEMON, Subramanian RAMA, Maciej BAJKOWSKI
Abstract: An interface software layer is interposed between at least one application and a plurality of coprocessors. A data and command stream issued by the application(s) to an API of an intended one of the coprocessors is intercepted by the layer, which also acquires and stores the execution state information for the intended coprocessor at a coprocessor synchronization boundary. At least a portion of the intercepted data and command stream data is stored in a replay log associated with the intended coprocessor. The replay log associated with the intended coprocessor is then read out, along with the stored execution state information, and is submitted to and serviced by at least one different one of the coprocessors other than the intended coprocessor.
Type:
Application
Filed:
March 16, 2019
Publication date:
July 11, 2019
Applicant:
Bitfusion.io, Inc.
Inventors:
Mazhar MEMON, Subramanian RAMA, Maciej BAJKOWSKI
Abstract: At least one application runs on a hardware platform that includes a plurality of coprocessors, each of which has a respective internal memory space. An intermediate software layer (MVL) is transparent to the application and intercepts calls for coprocessor use. If the data corresponding to an application's call, or separate calls from different entities (including different applications) to the same coprocessor, to the API of a target coprocessor, cannot be stored within the available internal memory space of the target coprocessor, but comprises data subsets that individually can, the MVL intercepts the call response to the application/entities and indicates that the target coprocessor can handle the request. The MVL then transfers the data subsets to the target coprocessor as needed by the corresponding kernel(s) and swaps out each data subset to the internal memory of another coprocessor to make room for subsequently needed data subsets.