Patents by Inventor Nagendra Gulur

Nagendra Gulur has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240103875
    Abstract: In one example, a neural network processor comprises a memory interface, an instruction buffer, a weights buffer, an input data register, a weights register, an output data register, a computing engine, and a controller. The controller is configured to: receive a first instruction from the instruction buffer; responsive to the first instruction, fetch input data elements from the memory interface to the input data register, and fetch weight elements from the weights buffer to the weights register. The controller is also configured to: receive a second instruction from the instruction buffer; and responsive to the second instruction: fetch the input data elements and the weight elements from, respectively, the input data register and the weights register to the computing engine; and perform, using the computing engine, computation operations between the input data elements and the weight elements to generate output data elements.
    Type: Application
    Filed: July 20, 2023
    Publication date: March 28, 2024
    Applicant: Texas Instruments Incorporated
    Inventors: Mahesh M Mehendale, Nagendra Gulur, Srinivasa BS Chakravarthy, Atul Lele, Hetul Sanghvi
  • Publication number: 20240103811
    Abstract: In one example, a neural network processor comprises an input data register, a weights register, a computing engine configurable to perform multiplication and accumulation (MAC) operations between input data elements of a range of input precisions and weight elements of a range of weight precisions, and a controller. The controller is configured to: receive a first indication of the particular input precision and a second indication of the particular weight precision, and configure the computing engine based on the first and second indications. The controller is also configured to, responsive to an instruction: fetch input data elements and weight elements to the computing engine; and perform, using the computing engine configured based on the first and second indications, MAC operations between the input data elements at the particular input precision and the weight elements at the particular weight precision to generate intermediate output data elements.
    Type: Application
    Filed: July 20, 2023
    Publication date: March 28, 2024
    Applicant: Texas Instruments Incorporated
    Inventors: Mahesh M Mehendale, Atul Lele, Nagendra Gulur, Hetul Sanghvi, Srinivasa BS Chakravarthy
  • Publication number: 20240104361
    Abstract: In one example, a neural network processor comprises a computing engine and a post-processing engine, the post-processing engine configurable to perform different post-processing operations for a range of output precisions and a range of weight precisions. The neural network processor further comprises a controller configured to: receive a first indication of a particular output precision, a second indication of the particular weight precision, and post-processing parameters; and configure the post-processing engine based on the first and second indications and the first and second post-processing parameters. The controller is further configured to, responsive to a first instruction, perform, using the computing engine, multiplication and accumulation operations between input data elements and weight elements to generate intermediate data elements.
    Type: Application
    Filed: July 20, 2023
    Publication date: March 28, 2024
    Applicant: Texas Instruments Incorporated
    Inventors: Mahesh M Mehendale, Hetul Sanghvi, Nagendra Gulur, Atul Lele, Srinivasa BS Chakravarthy
  • Patent number: 10296465
    Abstract: A processor architecture utilizing a L3 translation lookaside buffer (TLB) to reduce page walks. The processor includes multiple cores, where each core includes a L1 TLB and a L2 TLB. The processor further includes a L3 TLB that is shared across the processor cores, where the L3 TLB is implemented in off-chip or die-stack dynamic random-access memory. Furthermore, the processor includes a page table connected to the L3 TLB, where the page table stores a mapping between virtual addresses and physical addresses. In such an architecture, by having the L3 TLB with a very large capacity, performance may be improved, such as execution time, by eliminating page walks, which requires multiple data accesses.
    Type: Grant
    Filed: July 20, 2017
    Date of Patent: May 21, 2019
    Assignee: Board of Regents, The University of Texas System
    Inventors: Lizy K. John, Jee Ho Ryoo, Nagendra Gulur
  • Patent number: 10261915
    Abstract: A processor architecture which partitions on-chip data caches to efficiently cache translation entries alongside data which reduces conflicts between virtual to physical address translation and data accesses. The architecture includes processor cores that include a first level translation lookaside buffer (TLB) and a second level TLB located either internally within each processor core or shared across the processor cores. Furthermore, the architecture includes a second level data cache (e.g., located either internally within each processor core or shared across the processor cores) partitioned to store both data and translation entries. Furthermore, the architecture includes a third level data cache connected to the processor cores, where the third level data cache is partitioned to store both data and translation entries. The third level data cache is shared across the processor cores. The processor architecture can also include a data stack distance profiler and a translation stack distance profiler.
    Type: Grant
    Filed: September 15, 2017
    Date of Patent: April 16, 2019
    Assignee: Board of Regents, The University Of Texas System
    Inventors: Lizy K. John, Yashwant Marathe, Jee Ho Ryoo, Nagendra Gulur
  • Publication number: 20190087350
    Abstract: A processor architecture which partitions the on-chip data caches to efficiently cache translation entries alongside data which reduces the conflicts between virtual to physical address translation and data accesses. The architecture includes processor cores that include a first level translation lookaside buffer (TLB) and a second level TLB located either internally within each processor core or shared across the processor cores. Furthermore, the architecture includes a second level data cache (e.g., located either internally within each processor core or shared across the processor cores) partitioned to store both data and translation entries. Furthermore, the architecture includes a third level data cache connected to the processor cores, where the third level data cache is partitioned to store both data and translation entries. The third level data cache is shared across the processor cores.
    Type: Application
    Filed: September 15, 2017
    Publication date: March 21, 2019
    Inventors: Lizy K. John, Yashwant Marathe, Jee Ho Ryoo, Nagendra Gulur
  • Publication number: 20180150406
    Abstract: A processor architecture utilizing a L3 translation lookaside buffer (TLB) to reduce page walks. The processor includes multiple cores, where each core includes a L1 TLB and a L2 TLB. The processor further includes a L3 TLB that is shared across the processor cores, where the L3 TLB is implemented in off-chip or die-stack dynamic random-access memory. Furthermore, the processor includes a page table connected to the L3 TLB, where the page table stores a mapping between virtual addresses and physical addresses. In such an architecture, by having the L3 TLB with a very large capacity, performance may be improved, such as execution time, by eliminating page walks, which requires multiple data accesses.
    Type: Application
    Filed: July 20, 2017
    Publication date: May 31, 2018
    Inventors: Lizy K. John, Jee Ho Ryoo, Nagendra Gulur