Patents by Inventor Reed Kotler

Reed Kotler has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Software managed memory hierarchy

Patent number: 11989581

Abstract: A method, system, and apparatus are disclosed herein for bridging a deterministic phase of instructions with a non-deterministic phase of instructions when those instructions are executed by a machine learning accelerator while executing a machine learning network. Specifically, data is transferred from off-chip memory to on-chip memory (non-deterministic phase of instructions). The data transfer involves determining whether certain on-chip memory is already storing data that has not been consumed yet (e.g., certain memory locations on-chip may be storing data for future consumption and should not be overwritten). Based on determining that the certain on-chip memory is not storing data that has not been consumed yet, the data may be transferred from the off-chip memory to the on-chip memory and the target memory locations may be marked as storing data that has not been consumed yet. The deterministic phase of instructions may be started subsequently.

Type: Grant

Filed: April 17, 2020

Date of Patent: May 21, 2024

Assignee: SiMa Technologies, Inc.

Inventors: Nishit Shah, Reed Kotler
Inter-processor data transfer in a machine learning accelerator, using statically scheduled instructions

Patent number: 11886981

Abstract: A compiler generates a computer program implementing a machine learning network on a machine learning accelerator (MLA) including interconnected processing elements. The computer program includes data transfer instructions for non-colliding data transfers between the processing elements. To generate the data transfer instructions, the compiler determines non-conflicting data transfer paths for data transfers based on a topology of the interconnections between processing elements, on dependencies of the instructions and on a duration for execution of the instructions. Each data transfer path specifies a routing and a time slot for the data transfer. The compiler generates data transfer instructions that specify routing of the data transfers and generates a static schedule that schedules execution of the data transfer instructions during the time slots for the data transfers.

Type: Grant

Filed: May 1, 2020

Date of Patent: January 30, 2024

Assignee: SiMa Technologies, Inc.

Inventors: Nishit Shah, Srivathsa Dhruvanarayan, Reed Kotler
Ordering computations of a machine learning network in a machine learning accelerator for efficient memory usage

Patent number: 11803740

Abstract: A compiler manages memory usage in the machine learning accelerator by intelligently ordering computations of a machine learning network. The compiler identifies partial networks of the machine learning network representing portions of the machine learning network across multiple layers on which an output or set of outputs are dependent. Because any given output may depend on only a limited subset of intermediate outputs from the prior layers, each partial network may include only a small fraction of the intermediate outputs from each layer. Instead of implementing the MLN by computing one layer at a time, the compiler schedules instructions to sequentially implement partial networks. As each layer of a partial network is completed, the intermediate outputs can be released from memory. The described technique enables intermediate outputs to be directly streamed between processing elements of the machine learning accelerator without requiring large transfers to and from external memory.

Type: Grant

Filed: February 8, 2023

Date of Patent: October 31, 2023

Assignee: SiMa Technologies, Inc.

Inventors: Reed Kotler, Nishit Shah
ALLOCATING COMPUTATIONS OF A MACHINE LEARNING NETWORK IN A MACHINE LEARNING ACCELERATOR

Publication number: 20230334374

Abstract: A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The compiler allocates instructions of the computer program to different groups of processing elements (Tiles) for execution such that different groups of Tiles implement different layers of the machine learning network. The compiler may determine the size of the different groups based on a partial computation metric associated with the computations performed to implement the corresponding layer. Furthermore, the compiler may assign specific Tiles to each group based on a set of predefined layout constraints. The compiler may statically schedule at least a portion of the instructions into one or more deterministic phases for execution by the groups of Tiles.

Type: Application

Filed: June 26, 2023

Publication date: October 19, 2023

Inventors: Reed Kotler, Nishit Shah
Scheduling off-chip memory access for programs with predictable execution

Patent number: 11782757

Abstract: A machine learning network is implemented by executing a computer program of instructions on a machine learning accelerator (MLA) comprising a plurality of interconnected storage elements (SEs) and processing elements (PEs). The instructions are partitioned into blocks, which are retrieved from off-chip memory. The block includes a set of deterministic instructions (MLA instructions) to be executed by on-chip storage elements and/or processing elements according to a static schedule from a compiler. The MLA instructions may require data retrieved from off-chip memory by memory access instructions contained in prior blocks. The compiler also schedules the memory access instructions in a manner that avoids contention for access to the off-chip memory. By avoiding contention, the execution time of off-chip memory accesses becomes predictable enough and short enough that the memory access instructions may be scheduled so that they are known to complete before the retrieved data is required.

Type: Grant

Filed: May 7, 2021

Date of Patent: October 10, 2023

Assignee: SiMa Technologies, Inc.

Inventor: Reed Kotler
Avoiding data routing conflicts in a machine learning accelerator

Patent number: 11734549

Abstract: A compiler receives a description of a machine learning network (MLN) and generates a computer program that implements the MLN on a machine learning accelerator (MLA). To implement the MLN, the compiler generates compute instructions that implement computations of the MLN on different processing units (Tiles), and data transfer instructions that transfer data used in the computations. The compiler may statically schedule at least a portion of the instructions for execution by the Tiles according to fixed timing. The compiler may initially implement data transfers between non-adjacent Tiles (or external memories) by implementing a sequence of transfers through one or more intermediate Tiles (or external memories) in accordance with a set of default routing rules that dictates the data path. The computer program may then be simulated to identify routing conflicts. When routing conflicts are detected, the compiler updates the computer program in a manner that avoids the conflicts.

Type: Grant

Filed: April 21, 2020

Date of Patent: August 22, 2023

Assignee: SiMa Technologies, Inc.

Inventors: Reed Kotler, Nishit Shah
Allocating computations of a machine learning network in a machine learning accelerator

Patent number: 11734605

Abstract: A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The compiler allocates instructions of the computer program to different groups of processing elements (Tiles) for execution such that different groups of Tiles implement different layers of the machine learning network. The compiler may determine the size of the different groups based on a partial computation metric associated with the computations performed to implement the corresponding layer. Furthermore, the compiler may assign specific Tiles to each group based on a set of predefined layout constraints. The compiler may statically schedule at least a portion of the instructions into one or more deterministic phases for execution by the groups of Tiles.

Type: Grant

Filed: April 29, 2020

Date of Patent: August 22, 2023

Assignee: SiMa Technologies, Inc.

Inventors: Reed Kotler, Nishit Shah
ORDERING COMPUTATIONS OF A MACHINE LEARNING NETWORK IN A MACHINE LEARNING ACCELERATOR FOR EFFICIENT MEMORY USAGE

Publication number: 20230186063

Abstract: A compiler manages memory usage in the machine learning accelerator by intelligently ordering computations of a machine learning network. The compiler identifies partial networks of the machine learning network representing portions of the machine learning network across multiple layers on which an output or set of outputs are dependent. Because any given output may depend on only a limited subset of intermediate outputs from the prior layers, each partial network may include only a small fraction of the intermediate outputs from each layer. Instead of implementing the MLN by computing one layer at a time, the compiler schedules instructions to sequentially implement partial networks. As each layer of a partial network is completed, the intermediate outputs can be released from memory. The described technique enables intermediate outputs to be directly streamed between processing elements of the machine learning accelerator without requiring large transfers to and from external memory.

Type: Application

Filed: February 8, 2023

Publication date: June 15, 2023

Inventors: Reed Kotler, Nishit Shah
Ordering computations of a machine learning network in a machine learning accelerator for efficient memory usage

Patent number: 11586894

Abstract: A compiler efficiently manages memory usage in the machine learning accelerator by intelligently ordering computations of a machine learning network. The compiler identifies a set of partial networks of the machine learning network representing portions of the machine learning network across multiple layers on which an output or set of outputs are dependent. Because any given output may depend on only a limited subset of intermediate outputs from the prior layers, each partial network may include only a small fraction of the intermediate outputs from each layer. Instead of implementing the MLN by computing one layer at a time, the compiler schedules instructions to sequentially implement partial networks. As each layer of a partial network is completed, the intermediate outputs can be released from memory. The described technique enables intermediate outputs to be directly streamed between processing elements of the machine learning accelerator without requiring large transfers to and from external memory.

Type: Grant

Filed: May 4, 2020

Date of Patent: February 21, 2023

Assignee: SiMa Technologies, Inc.

Inventors: Reed Kotler, Nishit Shah
MACHINE LEARNING NETWORK IMPLEMENTED BY STATICALLY SCHEDULED INSTRUCTIONS

Publication number: 20230023303

Abstract: A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The computer program includes statically scheduled instructions that are executed by a mesh of processing elements (Tiles). The instructions executed by the Tiles are statically scheduled because the compiler can determine which instructions are executed by which Tiles at what times. For example, for the statically scheduled instructions, there are no conditions, branching or data dependencies that can be resolved only at run-time, and which would affect the timing and order of the execution of the instructions.

Type: Application

Filed: October 3, 2022

Publication date: January 26, 2023

Inventors: Nishit Shah, Reed Kotler, Srivathsa Dhruvanarayan, Moenes Zaher Iskarous, Kavitha Prasad, Yogesh Laxmikant Chobe, Sedny S.J Attia, Spenser Don Gilliland, Bradley Taylor
SCHEDULING OFF-CHIP MEMORY ACCESS FOR PROGRAMS WITH PREDICTABLE EXECUTION

Publication number: 20220357984

Abstract: A machine learning network is implemented by executing a computer program of instructions on a machine learning accelerator (MLA) comprising a plurality of interconnected storage elements (SEs) and processing elements (PEs). The instructions are partitioned into blocks, which are retrieved from off-chip memory. The block includes a set of deterministic instructions (MLA instructions) to be executed by on-chip storage elements and/or processing elements according to a static schedule from a compiler. The MLA instructions may require data retrieved from off-chip memory by memory access instructions contained in prior blocks. The compiler also schedules the memory access instructions in a manner that avoids contention for access to the off-chip memory. By avoiding contention, the execution time of off-chip memory accesses becomes predictable enough and short enough that the memory access instructions may be scheduled so that they are known to complete before the retrieved data is required.

Type: Application

Filed: May 7, 2021

Publication date: November 10, 2022

Inventor: Reed Kotler
Machine learning network implemented by statically scheduled instructions, with system-on-chip

Patent number: 11403519

Abstract: A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The computer program includes statically scheduled instructions that are executed by a mesh of processing elements (Tiles). The instructions executed by the Tiles are statically scheduled because the compiler can determine which instructions are executed by which Tiles at what times. For example, for the statically scheduled instructions, there are no conditions, branching or data dependencies that can be resolved only at run-time, and which would affect the timing and order of the execution of the instructions.

Type: Grant

Filed: April 6, 2020

Date of Patent: August 2, 2022

Assignee: SiMa Technologies, Inc.

Inventors: Nishit Shah, Reed Kotler, Srivathsa Dhruvanarayan, Moenes Zaher Iskarous, Kavitha Prasad, Yogesh Laxmikant Chobe, Sedny S. J Attia, Spenser Don Gilliland
Machine learning network implemented by statically scheduled instructions, with MLA chip

Patent number: 11354570

Abstract: A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The computer program includes statically scheduled instructions that are executed by a mesh of processing elements (Tiles). The instructions executed by the Tiles are statically scheduled because the compiler can determine which instructions are executed by which Tiles at what times. For example, for the statically scheduled instructions, there are no conditions, branching or data dependencies that can be resolved only at run-time, and which would affect the timing and order of the execution of the instructions.

Type: Grant

Filed: April 6, 2020

Date of Patent: June 7, 2022

Assignee: SiMa Technologies, Inc.

Inventors: Nishit Shah, Reed Kotler, Srivathsa Dhruvanarayan, Moenes Zaher Iskarous, Kavitha Prasad, Yogesh Laxmikant Chobe, Sedny S. J Attia, Spenser Don Gilliland
Machine learning network implemented by statically scheduled instructions, with compiler

Patent number: 11321607

Abstract: A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The computer program includes statically scheduled instructions that are executed by a mesh of processing elements (Tiles). The instructions executed by the Tiles are statically scheduled because the compiler can determine which instructions are executed by which Tiles at what times. For example, for the statically scheduled instructions, there are no conditions, branching or data dependencies that can be resolved only at run-time, and which would affect the timing and order of the execution of the instructions.

Type: Grant

Filed: April 3, 2020

Date of Patent: May 3, 2022

Assignee: SiMa Technologies, Inc.

Inventors: Nishit Shah, Reed Kotler, Srivathsa Dhruvanarayan, Moenes Zaher Iskarous, Kavitha Prasad, Yogesh Laxmikant Chobe, Sedny S. J Attia, Spenser Don Gilliland
INTER-PROCESSOR DATA TRANSFER IN A MACHINE LEARNING ACCELERATOR, USING STATICALLY SCHEDULED INSTRUCTIONS

Publication number: 20210342673

Abstract: A compiler generates a computer program implementing a machine learning network on a machine learning accelerator (MLA) including interconnected processing elements. The computer program includes data transfer instructions for non-colliding data transfers between the processing elements. To generate the data transfer instructions, the compiler determines non-conflicting data transfer paths for data transfers based on a topology of the interconnections between processing elements, on dependencies of the instructions and on a duration for execution of the instructions. Each data transfer path specifies a routing and a time slot for the data transfer. The compiler generates data transfer instructions that specify routing of the data transfers and generates a static schedule that schedules execution of the data transfer instructions during the time slots for the data transfers.

Type: Application

Filed: May 1, 2020

Publication date: November 4, 2021

Inventors: Nishit Shah, Srivathsa Dhruvanarayan, Reed Kotler
ORDERING COMPUTATIONS OF A MACHINE LEARNING NETWORK IN A MACHINE LEARNING ACCELERATOR FOR EFFICIENT MEMORY USAGE

Publication number: 20210342675

Abstract: A compiler efficiently manages memory usage in the machine learning accelerator by intelligently ordering computations of a machine learning network. The compiler identifies a set of partial networks of the machine learning network representing portions of the machine learning network across multiple layers on which an output or set of outputs are dependent. Because any given output may depend on only a limited subset of intermediate outputs from the prior layers, each partial network may include only a small fraction of the intermediate outputs from each layer. Instead of implementing the MLN by computing one layer at a time, the compiler schedules instructions to sequentially implement partial networks. As each layer of a partial network is completed, the intermediate outputs can be released from memory. The described technique enables intermediate outputs to be directly streamed between processing elements of the machine learning accelerator without requiring large transfers to and from external memory.

Type: Application

Filed: May 4, 2020

Publication date: November 4, 2021

Inventors: Reed Kotler, Nishit Shah
ALLOCATING COMPUTATIONS OF A MACHINE LEARNING NETWORK IN A MACHINE LEARNING ACCELERATOR

Publication number: 20210342733

Abstract: A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The compiler allocates instructions of the computer program to different groups of processing elements (Tiles) for execution such that different groups of Tiles implement different layers of the machine learning network. The compiler may determine the size of the different groups based on a partial computation metric associated with the computations performed to implement the corresponding layer. Furthermore, the compiler may assign specific Tiles to each group based on a set of predefined layout constraints. The compiler may statically schedule at least a portion of the instructions into one or more deterministic phases for execution by the groups of Tiles.

Type: Application

Filed: April 29, 2020

Publication date: November 4, 2021

Inventors: Reed Kotler, Nishit Shah
SOFTWARE MANAGED MEMORY HIERARCHY

Publication number: 20210326173

Abstract: A method, system, and apparatus are disclosed herein for bridging a deterministic phase of instructions with a non-deterministic phase of instructions when those instructions are executed by a machine learning accelerator while executing a machine learning network. Specifically, data is transferred from off-chip memory to on-chip memory (non-deterministic phase of instructions). The data transfer involves determining whether certain on-chip memory is already storing data that has not been consumed yet (e.g., certain memory locations on-chip may be storing data for future consumption and should not be overwritten). Based on determining that the certain on-chip memory is not storing data that has not been consumed yet, the data may be transferred from the off-chip memory to the on-chip memory and the target memory locations may be marked as storing data that has not been consumed yet. The deterministic phase of instructions may be started subsequently.

Type: Application

Filed: April 17, 2020

Publication date: October 21, 2021

Inventors: Nishit Shah, Reed Kotler
SYNCHRONIZATION OF PROCESSING ELEMENTS THAT EXECUTE STATICALLY SCHEDULED INSTRUCTIONS IN A MACHINE LEARNING ACCELERATOR

Publication number: 20210326189

Abstract: A method, system, and apparatus are disclosed herein for bridging a deterministic phase of instructions with a non-deterministic phase of instructions when those instructions are executed by a machine learning accelerator while executing a machine learning network. In the non-deterministic phase, data and instructions are transferred from off-chip memory to on-chip memory. When the transfer is complete, processing elements are synchronized and, upon synchronization, a deterministic phase of instructions is executed by the processing elements.

Type: Application

Filed: April 17, 2020

Publication date: October 21, 2021

Inventors: Nishit Shah, Srivathsa Dhruvanarayan, Reed Kotler
AVOIDING DATA ROUTING CONFLICTS IN A MACHINE LEARNING ACCELERATOR

Publication number: 20210326681

Abstract: A compiler receives a description of a machine learning network (MLN) and generates a computer program that implements the MLN on a machine learning accelerator (MLA). To implement the MLN, the compiler generates compute instructions that implement computations of the MLN on different processing units (Tiles), and data transfer instructions that transfer data used in the computations. The compiler may statically schedule at least a portion of the instructions for execution by the Tiles according to fixed timing. The compiler may initially implement data transfers between non-adjacent Tiles (or external memories) by implementing a sequence of transfers through one or more intermediate Tiles (or external memories) in accordance with a set of default routing rules that dictates the data path. The computer program may then be simulated to identify routing conflicts. When routing conflicts are detected, the compiler updates the computer program in a manner that avoids the conflicts.

Type: Application

Filed: April 21, 2020

Publication date: October 21, 2021

Inventors: Reed Kotler, Nishit Shah

1 2 next