Patents by Inventor Nagendra Gulur

Nagendra Gulur has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

NEURAL NETWORK PROCESSOR

Publication number: 20240103875

Abstract: In one example, a neural network processor comprises a memory interface, an instruction buffer, a weights buffer, an input data register, a weights register, an output data register, a computing engine, and a controller. The controller is configured to: receive a first instruction from the instruction buffer; responsive to the first instruction, fetch input data elements from the memory interface to the input data register, and fetch weight elements from the weights buffer to the weights register. The controller is also configured to: receive a second instruction from the instruction buffer; and responsive to the second instruction: fetch the input data elements and the weight elements from, respectively, the input data register and the weights register to the computing engine; and perform, using the computing engine, computation operations between the input data elements and the weight elements to generate output data elements.

Type: Application

Filed: July 20, 2023

Publication date: March 28, 2024

Applicant: Texas Instruments Incorporated

Inventors: Mahesh M Mehendale, Nagendra Gulur, Srinivasa BS Chakravarthy, Atul Lele, Hetul Sanghvi
NEURAL NETWORK PROCESSOR

Publication number: 20240103811

Abstract: In one example, a neural network processor comprises an input data register, a weights register, a computing engine configurable to perform multiplication and accumulation (MAC) operations between input data elements of a range of input precisions and weight elements of a range of weight precisions, and a controller. The controller is configured to: receive a first indication of the particular input precision and a second indication of the particular weight precision, and configure the computing engine based on the first and second indications. The controller is also configured to, responsive to an instruction: fetch input data elements and weight elements to the computing engine; and perform, using the computing engine configured based on the first and second indications, MAC operations between the input data elements at the particular input precision and the weight elements at the particular weight precision to generate intermediate output data elements.

Type: Application

Filed: July 20, 2023

Publication date: March 28, 2024

Applicant: Texas Instruments Incorporated

Inventors: Mahesh M Mehendale, Atul Lele, Nagendra Gulur, Hetul Sanghvi, Srinivasa BS Chakravarthy
NEURAL NETWORK PROCESSOR

Publication number: 20240104361

Abstract: In one example, a neural network processor comprises a computing engine and a post-processing engine, the post-processing engine configurable to perform different post-processing operations for a range of output precisions and a range of weight precisions. The neural network processor further comprises a controller configured to: receive a first indication of a particular output precision, a second indication of the particular weight precision, and post-processing parameters; and configure the post-processing engine based on the first and second indications and the first and second post-processing parameters. The controller is further configured to, responsive to a first instruction, perform, using the computing engine, multiplication and accumulation operations between input data elements and weight elements to generate intermediate data elements.

Type: Application

Filed: July 20, 2023

Publication date: March 28, 2024

Applicant: Texas Instruments Incorporated

Inventors: Mahesh M Mehendale, Hetul Sanghvi, Nagendra Gulur, Atul Lele, Srinivasa BS Chakravarthy
Processor using a level 3 translation lookaside buffer implemented in off-chip or die-stacked dynamic random-access memory

Patent number: 10296465

Abstract: A processor architecture utilizing a L3 translation lookaside buffer (TLB) to reduce page walks. The processor includes multiple cores, where each core includes a L1 TLB and a L2 TLB. The processor further includes a L3 TLB that is shared across the processor cores, where the L3 TLB is implemented in off-chip or die-stack dynamic random-access memory. Furthermore, the processor includes a page table connected to the L3 TLB, where the page table stores a mapping between virtual addresses and physical addresses. In such an architecture, by having the L3 TLB with a very large capacity, performance may be improved, such as execution time, by eliminating page walks, which requires multiple data accesses.

Type: Grant

Filed: July 20, 2017

Date of Patent: May 21, 2019

Assignee: Board of Regents, The University of Texas System

Inventors: Lizy K. John, Jee Ho Ryoo, Nagendra Gulur
Intelligently partitioning data cache to allocate space for translation entries

Patent number: 10261915

Abstract: A processor architecture which partitions on-chip data caches to efficiently cache translation entries alongside data which reduces conflicts between virtual to physical address translation and data accesses. The architecture includes processor cores that include a first level translation lookaside buffer (TLB) and a second level TLB located either internally within each processor core or shared across the processor cores. Furthermore, the architecture includes a second level data cache (e.g., located either internally within each processor core or shared across the processor cores) partitioned to store both data and translation entries. Furthermore, the architecture includes a third level data cache connected to the processor cores, where the third level data cache is partitioned to store both data and translation entries. The third level data cache is shared across the processor cores. The processor architecture can also include a data stack distance profiler and a translation stack distance profiler.

Type: Grant

Filed: September 15, 2017

Date of Patent: April 16, 2019

Assignee: Board of Regents, The University Of Texas System

Inventors: Lizy K. John, Yashwant Marathe, Jee Ho Ryoo, Nagendra Gulur
INTELLIGENTLY PARTITIONING DATA CACHE TO ALLOCATE SPACE FOR TRANSLATION ENTRIES

Publication number: 20190087350

Abstract: A processor architecture which partitions the on-chip data caches to efficiently cache translation entries alongside data which reduces the conflicts between virtual to physical address translation and data accesses. The architecture includes processor cores that include a first level translation lookaside buffer (TLB) and a second level TLB located either internally within each processor core or shared across the processor cores. Furthermore, the architecture includes a second level data cache (e.g., located either internally within each processor core or shared across the processor cores) partitioned to store both data and translation entries. Furthermore, the architecture includes a third level data cache connected to the processor cores, where the third level data cache is partitioned to store both data and translation entries. The third level data cache is shared across the processor cores.

Type: Application

Filed: September 15, 2017

Publication date: March 21, 2019

Inventors: Lizy K. John, Yashwant Marathe, Jee Ho Ryoo, Nagendra Gulur
PROCESSOR USING A LEVEL 3 TRANSLATION LOOKASIDE BUFFER IMPLEMENTED IN OFF-CHIP OR DIE-STACKED DYNAMIC RANDOM-ACCESS MEMORY

Publication number: 20180150406

Abstract: A processor architecture utilizing a L3 translation lookaside buffer (TLB) to reduce page walks. The processor includes multiple cores, where each core includes a L1 TLB and a L2 TLB. The processor further includes a L3 TLB that is shared across the processor cores, where the L3 TLB is implemented in off-chip or die-stack dynamic random-access memory. Furthermore, the processor includes a page table connected to the L3 TLB, where the page table stores a mapping between virtual addresses and physical addresses. In such an architecture, by having the L3 TLB with a very large capacity, performance may be improved, such as execution time, by eliminating page walks, which requires multiple data accesses.

Type: Application

Filed: July 20, 2017

Publication date: May 31, 2018

Inventors: Lizy K. John, Jee Ho Ryoo, Nagendra Gulur

NEURAL NETWORK PROCESSOR

NEURAL NETWORK PROCESSOR

NEURAL NETWORK PROCESSOR

Processor using a level 3 translation lookaside buffer implemented in off-chip or die-stacked dynamic random-access memory

Intelligently partitioning data cache to allocate space for translation entries

INTELLIGENTLY PARTITIONING DATA CACHE TO ALLOCATE SPACE FOR TRANSLATION ENTRIES

PROCESSOR USING A LEVEL 3 TRANSLATION LOOKASIDE BUFFER IMPLEMENTED IN OFF-CHIP OR DIE-STACKED DYNAMIC RANDOM-ACCESS MEMORY