Patents by Inventor Srivathsa Dhruvanarayan

Srivathsa Dhruvanarayan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Inter-processor data transfer in a machine learning accelerator, using statically scheduled instructions

Patent number: 11886981

Abstract: A compiler generates a computer program implementing a machine learning network on a machine learning accelerator (MLA) including interconnected processing elements. The computer program includes data transfer instructions for non-colliding data transfers between the processing elements. To generate the data transfer instructions, the compiler determines non-conflicting data transfer paths for data transfers based on a topology of the interconnections between processing elements, on dependencies of the instructions and on a duration for execution of the instructions. Each data transfer path specifies a routing and a time slot for the data transfer. The compiler generates data transfer instructions that specify routing of the data transfers and generates a static schedule that schedules execution of the data transfer instructions during the time slots for the data transfers.

Type: Grant

Filed: May 1, 2020

Date of Patent: January 30, 2024

Assignee: SiMa Technologies, Inc.

Inventors: Nishit Shah, Srivathsa Dhruvanarayan, Reed Kotler
Heterogeneous computing on a system-on-chip, including machine learning inference

Patent number: 11631001

Abstract: A system-on-chip (SoC) integrated circuit product includes a machine learning accelerator (MLA). It also includes other processor cores, such as general purpose processors and application-specific processors. It also includes a network-on-chip for communication between the different modules. The SoC implements a heterogeneous compute environment because the processor cores are customized for different purposes and typically will use different instruction sets. Applications may use some or all of the functionalities offered by the processor cores, and the processor cores may be programmed into different pipelines to perform different tasks.

Type: Grant

Filed: April 10, 2020

Date of Patent: April 18, 2023

Assignee: SiMa Technologies, Inc.

Inventors: Srivathsa Dhruvanarayan, Nishit Shah, Bradley Taylor, Moenes Zaher Iskarous
MACHINE LEARNING NETWORK IMPLEMENTED BY STATICALLY SCHEDULED INSTRUCTIONS

Publication number: 20230023303

Abstract: A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The computer program includes statically scheduled instructions that are executed by a mesh of processing elements (Tiles). The instructions executed by the Tiles are statically scheduled because the compiler can determine which instructions are executed by which Tiles at what times. For example, for the statically scheduled instructions, there are no conditions, branching or data dependencies that can be resolved only at run-time, and which would affect the timing and order of the execution of the instructions.

Type: Application

Filed: October 3, 2022

Publication date: January 26, 2023

Inventors: Nishit Shah, Reed Kotler, Srivathsa Dhruvanarayan, Moenes Zaher Iskarous, Kavitha Prasad, Yogesh Laxmikant Chobe, Sedny S.J Attia, Spenser Don Gilliland, Bradley Taylor
Efficient convolution of multi-channel input samples with multiple kernels

Patent number: 11488066

Abstract: Convolutions of an input sample with multiple kernels is decomposed into matrix multiplications of a V×C matrix of input values times a C×K matrix of kernel values, producing a V×K product. For the second matrix, C is a channel dimension (i.e., each row of the second matrix is a different channel of the input sample and kernel) and K is the kernel dimension (i.e., each column of the second matrix is a different kernel), but all the values correspond to the same pixel position in the kernel. In the matrix product, V is the output dimension and K is the kernel dimension. Thus, each value in the output matrix is a partial product for a certain output pixel and kernel, and the matrix multiplication parallelizes the convolutions by calculating partial products for multiple output pixels and multiple kernels.

Type: Grant

Filed: April 21, 2020

Date of Patent: November 1, 2022

Assignee: SiMa Technologies, Inc.

Inventors: Nishit Shah, Srivathsa Dhruvanarayan
GENERATION OF DESCRIPTIVE DATA FOR PACKET FIELDS

Publication number: 20220345423

Abstract: Some embodiments provide a method for a parser of a processing pipeline. The method receives a packet for processing by a set of match-action stages of the processing pipeline. The method stores packet header field (PHF) values from a first set of PHFs of the packet in a set of data containers. The first set of PHFs are for use by the match-action stages. For a second set of PHFs not used by the match-action stages, the method generates descriptive data that identifies locations of the PHFs of the second set within the packet. The method sends (i) the set of data containers to the match-action stages and (ii) the packet data and the generated descriptive data outside of the match-action stages to a deparser that uses the packet data, generated descriptive data, and the set of data containers as modified by the match-action stages to reconstruct a modified packet.

Type: Application

Filed: July 8, 2022

Publication date: October 27, 2022

Applicant: Barefoot Networks, Inc.

Inventors: Gregory C. Watson, Srivathsa Dhruvanarayan, Glen Raymond Gibb, Constantine Calamvokis, Aled Justin Edwards
Multichip timing synchronization circuits and methods

Patent number: 11474557

Abstract: In one embodiment, the present disclosure includes multichip timing synchronization circuits and methods. In one embodiment, hardware counters in different systems are synchronized. Programs on the systems may include synchronization instructions. A second system executes synchronization instruction, and in response thereto, synchronizes a local software counter to a local hardware counter. The software counter on the second system may be delayed a fixed period of time corresponding to a program delay on the first system. The software counter on the second system may further be delayed by an offset to bring software counters on the two systems into sync.

Type: Grant

Filed: September 15, 2020

Date of Patent: October 18, 2022

Assignee: GROQ, INC.

Inventors: Gregory Michael Thorson, Srivathsa Dhruvanarayan
Generation of descriptive data for packet fields

Patent number: 11425058

Abstract: Some embodiments provide a method for a parser of a processing pipeline. The method receives a packet for processing by a set of match-action stages of the processing pipeline. The method stores packet header field (PHF) values from a first set of PHFs of the packet in a set of data containers. The first set of PHFs are for use by the match-action stages. For a second set of PHFs not used by the match-action stages, the method generates descriptive data that identifies locations of the PHFs of the second set within the packet. The method sends (i) the set of data containers to the match-action stages and (ii) the packet data and the generated descriptive data outside of the match-action stages to a deparser that uses the packet data, generated descriptive data, and the set of data containers as modified by the match-action stages to reconstruct a modified packet.

Type: Grant

Filed: May 20, 2020

Date of Patent: August 23, 2022

Assignee: Barefoot Networks, Inc.

Inventors: Gregory C. Watson, Srivathsa Dhruvanarayan, Glen Raymond Gibb, Constantine Calamvokis, Aled Justin Edwards
Machine learning network implemented by statically scheduled instructions, with system-on-chip

Patent number: 11403519

Abstract: A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The computer program includes statically scheduled instructions that are executed by a mesh of processing elements (Tiles). The instructions executed by the Tiles are statically scheduled because the compiler can determine which instructions are executed by which Tiles at what times. For example, for the statically scheduled instructions, there are no conditions, branching or data dependencies that can be resolved only at run-time, and which would affect the timing and order of the execution of the instructions.

Type: Grant

Filed: April 6, 2020

Date of Patent: August 2, 2022

Assignee: SiMa Technologies, Inc.

Inventors: Nishit Shah, Reed Kotler, Srivathsa Dhruvanarayan, Moenes Zaher Iskarous, Kavitha Prasad, Yogesh Laxmikant Chobe, Sedny S. J Attia, Spenser Don Gilliland
Machine learning network implemented by statically scheduled instructions, with MLA chip

Patent number: 11354570

Abstract: A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The computer program includes statically scheduled instructions that are executed by a mesh of processing elements (Tiles). The instructions executed by the Tiles are statically scheduled because the compiler can determine which instructions are executed by which Tiles at what times. For example, for the statically scheduled instructions, there are no conditions, branching or data dependencies that can be resolved only at run-time, and which would affect the timing and order of the execution of the instructions.

Type: Grant

Filed: April 6, 2020

Date of Patent: June 7, 2022

Assignee: SiMa Technologies, Inc.

Inventors: Nishit Shah, Reed Kotler, Srivathsa Dhruvanarayan, Moenes Zaher Iskarous, Kavitha Prasad, Yogesh Laxmikant Chobe, Sedny S. J Attia, Spenser Don Gilliland
Machine learning network implemented by statically scheduled instructions, with compiler

Patent number: 11321607

Abstract: A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The computer program includes statically scheduled instructions that are executed by a mesh of processing elements (Tiles). The instructions executed by the Tiles are statically scheduled because the compiler can determine which instructions are executed by which Tiles at what times. For example, for the statically scheduled instructions, there are no conditions, branching or data dependencies that can be resolved only at run-time, and which would affect the timing and order of the execution of the instructions.

Type: Grant

Filed: April 3, 2020

Date of Patent: May 3, 2022

Assignee: SiMa Technologies, Inc.

Inventors: Nishit Shah, Reed Kotler, Srivathsa Dhruvanarayan, Moenes Zaher Iskarous, Kavitha Prasad, Yogesh Laxmikant Chobe, Sedny S. J Attia, Spenser Don Gilliland
INTER-PROCESSOR DATA TRANSFER IN A MACHINE LEARNING ACCELERATOR, USING STATICALLY SCHEDULED INSTRUCTIONS

Publication number: 20210342673

Abstract: A compiler generates a computer program implementing a machine learning network on a machine learning accelerator (MLA) including interconnected processing elements. The computer program includes data transfer instructions for non-colliding data transfers between the processing elements. To generate the data transfer instructions, the compiler determines non-conflicting data transfer paths for data transfers based on a topology of the interconnections between processing elements, on dependencies of the instructions and on a duration for execution of the instructions. Each data transfer path specifies a routing and a time slot for the data transfer. The compiler generates data transfer instructions that specify routing of the data transfers and generates a static schedule that schedules execution of the data transfer instructions during the time slots for the data transfers.

Type: Application

Filed: May 1, 2020

Publication date: November 4, 2021

Inventors: Nishit Shah, Srivathsa Dhruvanarayan, Reed Kotler
EFFICIENT CONVOLUTION OF MULTI-CHANNEL INPUT SAMPLES WITH MULTIPLE KERNELS

Publication number: 20210326750

Abstract: Convolutions of an input sample with multiple kernels is decomposed into matrix multiplications of a V×C matrix of input values times a C×K matrix of kernel values, producing a V×K product. For the second matrix, C is a channel dimension (i.e., each row of the second matrix is a different channel of the input sample and kernel) and K is the kernel dimension (i.e., each column of the second matrix is a different kernel), but all the values correspond to the same pixel position in the kernel. In the matrix product, V is the output dimension and K is the kernel dimension. Thus, each value in the output matrix is a partial product for a certain output pixel and kernel, and the matrix multiplication parallelizes the convolutions by calculating partial products for multiple output pixels and multiple kernels.

Type: Application

Filed: April 21, 2020

Publication date: October 21, 2021

Inventors: Nishit Shah, Srivathsa Dhruvanarayan
SYNCHRONIZATION OF PROCESSING ELEMENTS THAT EXECUTE STATICALLY SCHEDULED INSTRUCTIONS IN A MACHINE LEARNING ACCELERATOR

Publication number: 20210326189

Abstract: A method, system, and apparatus are disclosed herein for bridging a deterministic phase of instructions with a non-deterministic phase of instructions when those instructions are executed by a machine learning accelerator while executing a machine learning network. In the non-deterministic phase, data and instructions are transferred from off-chip memory to on-chip memory. When the transfer is complete, processing elements are synchronized and, upon synchronization, a deterministic phase of instructions is executed by the processing elements.

Type: Application

Filed: April 17, 2020

Publication date: October 21, 2021

Inventors: Nishit Shah, Srivathsa Dhruvanarayan, Reed Kotler
HETEROGENEOUS COMPUTING ON A SYSTEM-ON-CHIP, INCLUDING MACHINE LEARNING INFERENCE

Publication number: 20210319307

Abstract: A system-on-chip (SoC) integrated circuit product includes a machine learning accelerator (MLA). It also includes other processor cores, such as general purpose processors and application-specific processors. It also includes a network-on-chip for communication between the different modules. The SoC implements a heterogeneous compute environment because the processor cores are customized for different purposes and typically will use different instruction sets. Applications may use some or all of the functionalities offered by the processor cores, and the processor cores may be programmed into different pipelines to perform different tasks.

Type: Application

Filed: April 10, 2020

Publication date: October 14, 2021

Inventors: Srivathsa Dhruvanarayan, Nishit Shah, Bradley Taylor, Moenes Zaher Iskarous
MACHINE LEARNING NETWORK IMPLEMENTED BY STATICALLY SCHEDULED INSTRUCTIONS, WITH SYSTEM-ON-CHIP

Publication number: 20210312322

Abstract: A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The computer program includes statically scheduled instructions that are executed by a mesh of processing elements (Tiles). The instructions executed by the Tiles are statically scheduled because the compiler can determine which instructions are executed by which Tiles at what times. For example, for the statically scheduled instructions, there are no conditions, branching or data dependencies that can be resolved only at run-time, and which would affect the timing and order of the execution of the instructions.

Type: Application

Filed: April 6, 2020

Publication date: October 7, 2021

Inventors: Nishit Shah, Reed Kotler, Srivathsa Dhruvanarayan, Moenes Zaher Iskarous, Kavitha Prasad, Yogesh Laxmikant Chobe, Sedny S.J Attia, Spenser Don Gilliland
MACHINE LEARNING NETWORK IMPLEMENTED BY STATICALLY SCHEDULED INSTRUCTIONS, WITH MLA CHIP

Publication number: 20210312267

Abstract: A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The computer program includes statically scheduled instructions that are executed by a mesh of processing elements (Tiles). The instructions executed by the Tiles are statically scheduled because the compiler can determine which instructions are executed by which Tiles at what times. For example, for the statically scheduled instructions, there are no conditions, branching or data dependencies that can be resolved only at run-time, and which would affect the timing and order of the execution of the instructions.

Type: Application

Filed: April 6, 2020

Publication date: October 7, 2021

Inventors: Nishit Shah, Reed Kotler, Srivathsa Dhruvanarayan, Moenes Zaher Iskarous, Kavitha Prasad, Yogesh Laxmikant Chobe, Sedny S.J Attia, Spenser Don Gilliland
MACHINE LEARNING NETWORK IMPLEMENTED BY STATICALLY SCHEDULED INSTRUCTIONS, WITH COMPILER

Publication number: 20210312320

Abstract: A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The computer program includes statically scheduled instructions that are executed by a mesh of processing elements (Tiles). The instructions executed by the Tiles are statically scheduled because the compiler can determine which instructions are executed by which Tiles at what times. For example, for the statically scheduled instructions, there are no conditions, branching or data dependencies that can be resolved only at run-time, and which would affect the timing and order of the execution of the instructions.

Type: Application

Filed: April 3, 2020

Publication date: October 7, 2021

Inventors: Nishit Shah, Reed Kotler, Srivathsa Dhruvanarayan, Moenes Zaher Iskarous, Kavitha Prasad, Yogesh Laxmikant Chobe, Sedny S.J Attia, Spenser Don Gilliland
Multichip fault management

Patent number: 11115147

Abstract: Embodiments of the present disclosure pertain to improved circuit and system architectures for identifying and managing operating statuses and faults in a system having multiple processing circuit chips. Each of the multiple processing circuit chips includes multiple signal rings, one to provide internal communications among circuitry within the circuit chip, and another with inter-chip communications circuitry to provide communications with neighboring circuit chips. One of the multiple processing circuit chips further includes external communications circuitry to provide communications with an external host.

Type: Grant

Filed: January 9, 2019

Date of Patent: September 7, 2021

Assignee: Groq, Inc.

Inventors: Matthew Pond Baker, Srivathsa Dhruvanarayan, Boone Jared Severson
QUEUE SCHEDULER CONTROL VIA PACKET DATA

Publication number: 20210105220

Abstract: Some embodiments provide a method for a hardware forwarding element that includes multiple queues. The method receives a packet at a multi-stage processing pipeline of the hardware forwarding element. The method determines, at one of the stages of the processing pipeline, to modify a setting of a particular one of the queues. The method stores an identifier for the particular queue and instructions to modify the queue setting with data passed through the processing pipeline for the packet. The stored information is subsequently used by the hardware forwarding element to modify the queue setting.

Type: Application

Filed: October 16, 2020

Publication date: April 8, 2021

Inventors: Jeongkeun LEE, Yi LI, Michael FENG, Srivathsa Dhruvanarayan, Anurag AGRAWAL
Copying packet data to mirror buffer

Patent number: 10949199

Abstract: Some embodiments provide a method for a network forwarding integrated circuit (IC). The method receives packet data with an instruction to copy a portion of the packet data to a temporary storage of the network forwarding IC. The portion is larger than a maximum entry size of the temporary storage. The method generates a header for each of multiple packet data sections for storage in entries of the temporary storage, with each packet data section including a sub-portion of the packet data portion. The method sends the packet data sections with the generated headers to the temporary storage for storage in multiple separate temporary storage entries.

Type: Grant

Filed: December 8, 2017

Date of Patent: March 16, 2021

Assignee: Barefoot Networks, Inc.

Inventors: Xiaozhou Li, Jeongkeun Lee, Srivathsa Dhruvanarayan, Anurag Agrawal, Changhoon Kim, Alain Loge

1 2 next