Patents by Inventor Srivathsa Dhruvanarayan
Srivathsa Dhruvanarayan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11886981Abstract: A compiler generates a computer program implementing a machine learning network on a machine learning accelerator (MLA) including interconnected processing elements. The computer program includes data transfer instructions for non-colliding data transfers between the processing elements. To generate the data transfer instructions, the compiler determines non-conflicting data transfer paths for data transfers based on a topology of the interconnections between processing elements, on dependencies of the instructions and on a duration for execution of the instructions. Each data transfer path specifies a routing and a time slot for the data transfer. The compiler generates data transfer instructions that specify routing of the data transfers and generates a static schedule that schedules execution of the data transfer instructions during the time slots for the data transfers.Type: GrantFiled: May 1, 2020Date of Patent: January 30, 2024Assignee: SiMa Technologies, Inc.Inventors: Nishit Shah, Srivathsa Dhruvanarayan, Reed Kotler
-
Patent number: 11631001Abstract: A system-on-chip (SoC) integrated circuit product includes a machine learning accelerator (MLA). It also includes other processor cores, such as general purpose processors and application-specific processors. It also includes a network-on-chip for communication between the different modules. The SoC implements a heterogeneous compute environment because the processor cores are customized for different purposes and typically will use different instruction sets. Applications may use some or all of the functionalities offered by the processor cores, and the processor cores may be programmed into different pipelines to perform different tasks.Type: GrantFiled: April 10, 2020Date of Patent: April 18, 2023Assignee: SiMa Technologies, Inc.Inventors: Srivathsa Dhruvanarayan, Nishit Shah, Bradley Taylor, Moenes Zaher Iskarous
-
Publication number: 20230023303Abstract: A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The computer program includes statically scheduled instructions that are executed by a mesh of processing elements (Tiles). The instructions executed by the Tiles are statically scheduled because the compiler can determine which instructions are executed by which Tiles at what times. For example, for the statically scheduled instructions, there are no conditions, branching or data dependencies that can be resolved only at run-time, and which would affect the timing and order of the execution of the instructions.Type: ApplicationFiled: October 3, 2022Publication date: January 26, 2023Inventors: Nishit Shah, Reed Kotler, Srivathsa Dhruvanarayan, Moenes Zaher Iskarous, Kavitha Prasad, Yogesh Laxmikant Chobe, Sedny S.J Attia, Spenser Don Gilliland, Bradley Taylor
-
Patent number: 11488066Abstract: Convolutions of an input sample with multiple kernels is decomposed into matrix multiplications of a V×C matrix of input values times a C×K matrix of kernel values, producing a V×K product. For the second matrix, C is a channel dimension (i.e., each row of the second matrix is a different channel of the input sample and kernel) and K is the kernel dimension (i.e., each column of the second matrix is a different kernel), but all the values correspond to the same pixel position in the kernel. In the matrix product, V is the output dimension and K is the kernel dimension. Thus, each value in the output matrix is a partial product for a certain output pixel and kernel, and the matrix multiplication parallelizes the convolutions by calculating partial products for multiple output pixels and multiple kernels.Type: GrantFiled: April 21, 2020Date of Patent: November 1, 2022Assignee: SiMa Technologies, Inc.Inventors: Nishit Shah, Srivathsa Dhruvanarayan
-
Publication number: 20220345423Abstract: Some embodiments provide a method for a parser of a processing pipeline. The method receives a packet for processing by a set of match-action stages of the processing pipeline. The method stores packet header field (PHF) values from a first set of PHFs of the packet in a set of data containers. The first set of PHFs are for use by the match-action stages. For a second set of PHFs not used by the match-action stages, the method generates descriptive data that identifies locations of the PHFs of the second set within the packet. The method sends (i) the set of data containers to the match-action stages and (ii) the packet data and the generated descriptive data outside of the match-action stages to a deparser that uses the packet data, generated descriptive data, and the set of data containers as modified by the match-action stages to reconstruct a modified packet.Type: ApplicationFiled: July 8, 2022Publication date: October 27, 2022Applicant: Barefoot Networks, Inc.Inventors: Gregory C. Watson, Srivathsa Dhruvanarayan, Glen Raymond Gibb, Constantine Calamvokis, Aled Justin Edwards
-
Patent number: 11474557Abstract: In one embodiment, the present disclosure includes multichip timing synchronization circuits and methods. In one embodiment, hardware counters in different systems are synchronized. Programs on the systems may include synchronization instructions. A second system executes synchronization instruction, and in response thereto, synchronizes a local software counter to a local hardware counter. The software counter on the second system may be delayed a fixed period of time corresponding to a program delay on the first system. The software counter on the second system may further be delayed by an offset to bring software counters on the two systems into sync.Type: GrantFiled: September 15, 2020Date of Patent: October 18, 2022Assignee: GROQ, INC.Inventors: Gregory Michael Thorson, Srivathsa Dhruvanarayan
-
Patent number: 11425058Abstract: Some embodiments provide a method for a parser of a processing pipeline. The method receives a packet for processing by a set of match-action stages of the processing pipeline. The method stores packet header field (PHF) values from a first set of PHFs of the packet in a set of data containers. The first set of PHFs are for use by the match-action stages. For a second set of PHFs not used by the match-action stages, the method generates descriptive data that identifies locations of the PHFs of the second set within the packet. The method sends (i) the set of data containers to the match-action stages and (ii) the packet data and the generated descriptive data outside of the match-action stages to a deparser that uses the packet data, generated descriptive data, and the set of data containers as modified by the match-action stages to reconstruct a modified packet.Type: GrantFiled: May 20, 2020Date of Patent: August 23, 2022Assignee: Barefoot Networks, Inc.Inventors: Gregory C. Watson, Srivathsa Dhruvanarayan, Glen Raymond Gibb, Constantine Calamvokis, Aled Justin Edwards
-
Patent number: 11403519Abstract: A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The computer program includes statically scheduled instructions that are executed by a mesh of processing elements (Tiles). The instructions executed by the Tiles are statically scheduled because the compiler can determine which instructions are executed by which Tiles at what times. For example, for the statically scheduled instructions, there are no conditions, branching or data dependencies that can be resolved only at run-time, and which would affect the timing and order of the execution of the instructions.Type: GrantFiled: April 6, 2020Date of Patent: August 2, 2022Assignee: SiMa Technologies, Inc.Inventors: Nishit Shah, Reed Kotler, Srivathsa Dhruvanarayan, Moenes Zaher Iskarous, Kavitha Prasad, Yogesh Laxmikant Chobe, Sedny S. J Attia, Spenser Don Gilliland
-
Patent number: 11354570Abstract: A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The computer program includes statically scheduled instructions that are executed by a mesh of processing elements (Tiles). The instructions executed by the Tiles are statically scheduled because the compiler can determine which instructions are executed by which Tiles at what times. For example, for the statically scheduled instructions, there are no conditions, branching or data dependencies that can be resolved only at run-time, and which would affect the timing and order of the execution of the instructions.Type: GrantFiled: April 6, 2020Date of Patent: June 7, 2022Assignee: SiMa Technologies, Inc.Inventors: Nishit Shah, Reed Kotler, Srivathsa Dhruvanarayan, Moenes Zaher Iskarous, Kavitha Prasad, Yogesh Laxmikant Chobe, Sedny S. J Attia, Spenser Don Gilliland
-
Patent number: 11321607Abstract: A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The computer program includes statically scheduled instructions that are executed by a mesh of processing elements (Tiles). The instructions executed by the Tiles are statically scheduled because the compiler can determine which instructions are executed by which Tiles at what times. For example, for the statically scheduled instructions, there are no conditions, branching or data dependencies that can be resolved only at run-time, and which would affect the timing and order of the execution of the instructions.Type: GrantFiled: April 3, 2020Date of Patent: May 3, 2022Assignee: SiMa Technologies, Inc.Inventors: Nishit Shah, Reed Kotler, Srivathsa Dhruvanarayan, Moenes Zaher Iskarous, Kavitha Prasad, Yogesh Laxmikant Chobe, Sedny S. J Attia, Spenser Don Gilliland
-
Publication number: 20210342673Abstract: A compiler generates a computer program implementing a machine learning network on a machine learning accelerator (MLA) including interconnected processing elements. The computer program includes data transfer instructions for non-colliding data transfers between the processing elements. To generate the data transfer instructions, the compiler determines non-conflicting data transfer paths for data transfers based on a topology of the interconnections between processing elements, on dependencies of the instructions and on a duration for execution of the instructions. Each data transfer path specifies a routing and a time slot for the data transfer. The compiler generates data transfer instructions that specify routing of the data transfers and generates a static schedule that schedules execution of the data transfer instructions during the time slots for the data transfers.Type: ApplicationFiled: May 1, 2020Publication date: November 4, 2021Inventors: Nishit Shah, Srivathsa Dhruvanarayan, Reed Kotler
-
Publication number: 20210326750Abstract: Convolutions of an input sample with multiple kernels is decomposed into matrix multiplications of a V×C matrix of input values times a C×K matrix of kernel values, producing a V×K product. For the second matrix, C is a channel dimension (i.e., each row of the second matrix is a different channel of the input sample and kernel) and K is the kernel dimension (i.e., each column of the second matrix is a different kernel), but all the values correspond to the same pixel position in the kernel. In the matrix product, V is the output dimension and K is the kernel dimension. Thus, each value in the output matrix is a partial product for a certain output pixel and kernel, and the matrix multiplication parallelizes the convolutions by calculating partial products for multiple output pixels and multiple kernels.Type: ApplicationFiled: April 21, 2020Publication date: October 21, 2021Inventors: Nishit Shah, Srivathsa Dhruvanarayan
-
Publication number: 20210326189Abstract: A method, system, and apparatus are disclosed herein for bridging a deterministic phase of instructions with a non-deterministic phase of instructions when those instructions are executed by a machine learning accelerator while executing a machine learning network. In the non-deterministic phase, data and instructions are transferred from off-chip memory to on-chip memory. When the transfer is complete, processing elements are synchronized and, upon synchronization, a deterministic phase of instructions is executed by the processing elements.Type: ApplicationFiled: April 17, 2020Publication date: October 21, 2021Inventors: Nishit Shah, Srivathsa Dhruvanarayan, Reed Kotler
-
Publication number: 20210319307Abstract: A system-on-chip (SoC) integrated circuit product includes a machine learning accelerator (MLA). It also includes other processor cores, such as general purpose processors and application-specific processors. It also includes a network-on-chip for communication between the different modules. The SoC implements a heterogeneous compute environment because the processor cores are customized for different purposes and typically will use different instruction sets. Applications may use some or all of the functionalities offered by the processor cores, and the processor cores may be programmed into different pipelines to perform different tasks.Type: ApplicationFiled: April 10, 2020Publication date: October 14, 2021Inventors: Srivathsa Dhruvanarayan, Nishit Shah, Bradley Taylor, Moenes Zaher Iskarous
-
Publication number: 20210312322Abstract: A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The computer program includes statically scheduled instructions that are executed by a mesh of processing elements (Tiles). The instructions executed by the Tiles are statically scheduled because the compiler can determine which instructions are executed by which Tiles at what times. For example, for the statically scheduled instructions, there are no conditions, branching or data dependencies that can be resolved only at run-time, and which would affect the timing and order of the execution of the instructions.Type: ApplicationFiled: April 6, 2020Publication date: October 7, 2021Inventors: Nishit Shah, Reed Kotler, Srivathsa Dhruvanarayan, Moenes Zaher Iskarous, Kavitha Prasad, Yogesh Laxmikant Chobe, Sedny S.J Attia, Spenser Don Gilliland
-
Publication number: 20210312267Abstract: A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The computer program includes statically scheduled instructions that are executed by a mesh of processing elements (Tiles). The instructions executed by the Tiles are statically scheduled because the compiler can determine which instructions are executed by which Tiles at what times. For example, for the statically scheduled instructions, there are no conditions, branching or data dependencies that can be resolved only at run-time, and which would affect the timing and order of the execution of the instructions.Type: ApplicationFiled: April 6, 2020Publication date: October 7, 2021Inventors: Nishit Shah, Reed Kotler, Srivathsa Dhruvanarayan, Moenes Zaher Iskarous, Kavitha Prasad, Yogesh Laxmikant Chobe, Sedny S.J Attia, Spenser Don Gilliland
-
Publication number: 20210312320Abstract: A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The computer program includes statically scheduled instructions that are executed by a mesh of processing elements (Tiles). The instructions executed by the Tiles are statically scheduled because the compiler can determine which instructions are executed by which Tiles at what times. For example, for the statically scheduled instructions, there are no conditions, branching or data dependencies that can be resolved only at run-time, and which would affect the timing and order of the execution of the instructions.Type: ApplicationFiled: April 3, 2020Publication date: October 7, 2021Inventors: Nishit Shah, Reed Kotler, Srivathsa Dhruvanarayan, Moenes Zaher Iskarous, Kavitha Prasad, Yogesh Laxmikant Chobe, Sedny S.J Attia, Spenser Don Gilliland
-
Patent number: 11115147Abstract: Embodiments of the present disclosure pertain to improved circuit and system architectures for identifying and managing operating statuses and faults in a system having multiple processing circuit chips. Each of the multiple processing circuit chips includes multiple signal rings, one to provide internal communications among circuitry within the circuit chip, and another with inter-chip communications circuitry to provide communications with neighboring circuit chips. One of the multiple processing circuit chips further includes external communications circuitry to provide communications with an external host.Type: GrantFiled: January 9, 2019Date of Patent: September 7, 2021Assignee: Groq, Inc.Inventors: Matthew Pond Baker, Srivathsa Dhruvanarayan, Boone Jared Severson
-
Publication number: 20210105220Abstract: Some embodiments provide a method for a hardware forwarding element that includes multiple queues. The method receives a packet at a multi-stage processing pipeline of the hardware forwarding element. The method determines, at one of the stages of the processing pipeline, to modify a setting of a particular one of the queues. The method stores an identifier for the particular queue and instructions to modify the queue setting with data passed through the processing pipeline for the packet. The stored information is subsequently used by the hardware forwarding element to modify the queue setting.Type: ApplicationFiled: October 16, 2020Publication date: April 8, 2021Inventors: Jeongkeun LEE, Yi LI, Michael FENG, Srivathsa Dhruvanarayan, Anurag AGRAWAL
-
Patent number: 10949199Abstract: Some embodiments provide a method for a network forwarding integrated circuit (IC). The method receives packet data with an instruction to copy a portion of the packet data to a temporary storage of the network forwarding IC. The portion is larger than a maximum entry size of the temporary storage. The method generates a header for each of multiple packet data sections for storage in entries of the temporary storage, with each packet data section including a sub-portion of the packet data portion. The method sends the packet data sections with the generated headers to the temporary storage for storage in multiple separate temporary storage entries.Type: GrantFiled: December 8, 2017Date of Patent: March 16, 2021Assignee: Barefoot Networks, Inc.Inventors: Xiaozhou Li, Jeongkeun Lee, Srivathsa Dhruvanarayan, Anurag Agrawal, Changhoon Kim, Alain Loge