Patents by Inventor Mihir Mody
Mihir Mody has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11662211Abstract: Information communication circuitry, including a first integrated circuit for coupling to a second integrated circuit in a package on package configuration. The first integrated circuit comprises processing circuitry for communicating information bits, and the information bits comprise data bits and error correction bits, where the error correction bits are for indicating whether data bits are received correctly. The second integrated circuit comprises a memory for receiving and storing at least some of the information bits. The information communication circuitry also includes interfacing circuitry for selectively communicating, along a number of conductors, between the package on package configuration. In a first instance, the interfacing circuitry selectively communicates only data bits along the number of conductors.Type: GrantFiled: August 3, 2020Date of Patent: May 30, 2023Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Rahul Gulati, Aishwarya Dubey, Nainala Vyagrheswarudu, Vasant Easwaran, Prashant Dinkar Karandikar, Mihir Mody
-
Patent number: 11615043Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed herein to enable data aggregation and pattern adaptation in hardware acceleration subsystems. In some examples, a hardware acceleration subsystem includes a first scheduler, a first hardware accelerator coupled to the first scheduler to process at least a first data element and a second data element, and a first load store engine coupled to the first hardware accelerator, the first load store engine configured to communicate with the first scheduler at a superblock level by sending a done signal to the first scheduler in response to determining that a block count is equal to a first BPR value and aggregate the first data element and the second data element based on the first BPR value to generate a first aggregated data element.Type: GrantFiled: December 31, 2020Date of Patent: March 28, 2023Assignee: Texas Instruments IncorporatedInventors: Niraj Nandan, Rajasekhar Reddy Allu, Brian Chae, Mihir Mody
-
Patent number: 11586465Abstract: A device includes a hardware data processing node configured to execute a respective task, and a hardware thread scheduler including a hardware task scheduler. The hardware task scheduler is coupled to the hardware data processing node and has a producer socket, a consumer socket, and a spare socket. The spare socket is configured to provide data control signals also provided by a first socket of the producer and consumer sockets responsive to a memory-mapped register being a first value. The spare socket is configured to provide data control signals also provided by a second socket of the producer and consumer sockets responsive to the memory-mapped register being a second value.Type: GrantFiled: December 30, 2020Date of Patent: February 21, 2023Assignee: Texas Instruments IncorporatedInventors: Niraj Nandan, Mihir Mody
-
Publication number: 20220114120Abstract: A processing accelerator includes a shared memory, and a stream accelerator, a memory-to-memory accelerator, and a common DMA controller coupled to the shared memory. The stream accelerator is configured to process a real-time data stream, and to store stream accelerator output data generated by processing the real-time data stream in the shared memory. The memory-to-memory accelerator is configured to retrieve input data from the shared memory, to process the input data, and to store, in the shared memory, memory-to-memory accelerator output data generated by processing the input data. The common DMA controller is configured to retrieve stream accelerator output data from the shared memory and transfer the stream accelerator output data to memory external to the processing accelerator; and to retrieve the memory-to-memory accelerator output data from the shared memory and transfer the memory-to-memory accelerator output data to memory external to the processing accelerator.Type: ApplicationFiled: December 21, 2021Publication date: April 14, 2022Inventors: Mihir MODY, Niraj NANDAN, Hetul SANGHVI, Brian CHAE, Rajasekhar Reddy ALLU, Jason A.T. JONES, Anthony LELL, Anish REGHUNATH
-
Patent number: 11237991Abstract: A processing accelerator includes a shared memory, and a stream accelerator, a memory-to-memory accelerator, and a common DMA controller coupled to the shared memory. The stream accelerator is configured to process a real-time data stream, and to store stream accelerator output data generated by processing the real-time data stream in the shared memory. The memory-to-memory accelerator is configured to retrieve input data from the shared memory, to process the input data, and to store, in the shared memory, memory-to-memory accelerator output data generated by processing the input data. The common DMA controller is configured to retrieve stream accelerator output data from the shared memory and transfer the stream accelerator output data to memory external to the processing accelerator; and to retrieve the memory-to-memory accelerator output data from the shared memory and transfer the memory-to-memory accelerator output data to memory external to the processing accelerator.Type: GrantFiled: August 17, 2020Date of Patent: February 1, 2022Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Mihir Mody, Niraj Nandan, Hetul Sanghvi, Brian Chae, Rajasekhar Reddy Allu, Jason A. T. Jones, Anthony Lell, Anish Reghunath
-
Publication number: 20220012312Abstract: In some examples, a system includes storage storing a machine learning model, wherein the machine learning model comprises a plurality of layers comprising multiple weights. The system also includes a processing unit coupled to the storage and operable to group the weights in each layer into a plurality of partitions; determine a number of least significant bits to be used for watermarking in each of the plurality of partitions; insert one or more watermark bits into the determined least significant bits for each of the plurality of partitions; and scramble one or more of the weight bits to produce watermarked and scrambled weights. The system also includes an output device to provide the watermarked and scrambled weights to another device.Type: ApplicationFiled: September 28, 2021Publication date: January 13, 2022Inventors: Deepak Kumar PODDAR, Mihir MODY, Veeramanikandan RAJU, Jason A.T. JONES
-
Publication number: 20210397528Abstract: A system to implement debugging for a multi-threaded processor is provided. The system includes a hardware thread scheduler configured to schedule processing of data, and a plurality of schedulers, each configured to schedule a given pipeline for processing instructions. The system further includes a debug control configured to control at least one of the plurality of schedulers to halt, step, or resume the given pipeline of the at least one of the plurality of schedulers for the data to enable debugging thereof. The system further includes a plurality of hardware accelerators configured to implement a series of tasks in accordance with a schedule provided by a respective scheduler in accordance with a command from the debug control. Each of the plurality of hardware accelerators is coupled to at least one of the plurality of schedulers to execute the instructions for the given pipeline and to a shared memory.Type: ApplicationFiled: August 31, 2021Publication date: December 23, 2021Inventors: NIRAJ NANDAN, HETUL SANGHVI, MIHIR MODY, GARY COOPER, ANTHONY LELL
-
Patent number: 11163861Abstract: In some examples, a system includes storage storing a machine learning model, wherein the machine learning model comprises a plurality of layers comprising multiple weights. The system also includes a processing unit coupled to the storage and operable to group the weights in each layer into a plurality of partitions; determine a number of least significant bits to be used for watermarking in each of the plurality of partitions; insert one or more watermark bits into the determined least significant bits for each of the plurality of partitions; and scramble one or more of the weight bits to produce watermarked and scrambled weights. The system also includes an output device to provide the watermarked and scrambled weights to another device.Type: GrantFiled: November 13, 2018Date of Patent: November 2, 2021Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Deepak Kumar Poddar, Mihir Mody, Veeramanikandan Raju, Jason A. T. Jones
-
Publication number: 20210326276Abstract: Methods and apparatus to extend local buffer of a hardware accelerator are disclosed herein. In some examples, an apparatus, including a local memory, a first hardware accelerator (HWA), a second HWA, the second HWA and the first HWA connected in a flexible data pipeline, and a spare scheduler to manage, in response to the spare scheduler inserted in the flexible data pipeline, data movement between the first HWA and the second HWA through the local memory and a memory. Local buffer extension may be performed by software to control data movement between local memory and other system memory. The other system memory may be on-chip memory and/or external memory. The HWA sub-system includes a set of spare schedulers to manage the data movement. Data aggregation may be performed in the other system memory. Additionally, the other system memory may be utilized for conversion between data line and data block.Type: ApplicationFiled: December 30, 2020Publication date: October 21, 2021Inventors: Niraj Nandan, Mihir Mody, Rajat Sagar
-
Publication number: 20210326229Abstract: Methods, apparatus, systems and articles of manufacture for an example event processor are disclosed to retrieve an input event and an input event timestamp corresponding to the input event, generate an output event based on the input event and the input event timestamp, in response to determination that an input event threshold is exceeded within a threshold of time, and an anomaly detector to retrieve the output event, determine whether the output event indicates threat to functional safety of a system on a chip, and in response to determining the output event indicates threat to functional safety of the system on a chip, adapt a process for the system on a chip to preserve functional safety.Type: ApplicationFiled: December 28, 2020Publication date: October 21, 2021Inventors: Rajat Sagar, Niraj Nandan, Kedar Chitnis, Brijesh Jadav, Mihir Mody
-
Publication number: 20210326174Abstract: A device includes a hardware data processing node configured to execute a respective task, and a hardware thread scheduler including a hardware task scheduler. The hardware task scheduler is coupled to the hardware data processing node and has a producer socket, a consumer socket, and a spare socket. The spare socket is configured to provide data control signals also provided by a first socket of the producer and consumer sockets responsive to a memory-mapped register being a first value. The spare socket is configured to provide data control signals also provided by a second socket of the producer and consumer sockets responsive to the memory-mapped register being a second value.Type: ApplicationFiled: December 30, 2020Publication date: October 21, 2021Inventors: Niraj NANDAN, Mihir MODY
-
Patent number: 11144417Abstract: A system to implement debugging for a multi-threaded processor is provided. The system includes a hardware thread scheduler configured to schedule processing of data, and a plurality of schedulers, each configured to schedule a given pipeline for processing instructions. The system further includes a debug control configured to control at least one of the plurality of schedulers to halt, step, or resume the given pipeline of the at least one of the plurality of schedulers for the data to enable debugging thereof. The system further includes a plurality of hardware accelerators configured to implement a series of tasks in accordance with a schedule provided by a respective scheduler in accordance with a command from the debug control. Each of the plurality of hardware accelerators is coupled to at least one of the plurality of schedulers to execute the instructions for the given pipeline and to a shared memory.Type: GrantFiled: December 31, 2018Date of Patent: October 12, 2021Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Niraj Nandan, Hetul Sanghvi, Mihir Mody, Gary Cooper, Anthony Lell
-
Publication number: 20210209390Abstract: An image data frame is received from an external source. An error concealment operation is performed on the received image data frame in response to determining that a first frame size of the received image data frame is erroneous. The first frame size of the image data frame is determined to be erroneous based on at least one frame synchronization signal associated with the image data frame. An image processing operation is performed on the received image data frame on which the error concealment operation has been performed, thereby enabling an image processing module to perform the image processing operation without entering into a deadlock state and thereby prevent a host processor from having to execute hardware resets of deadlocked modules.Type: ApplicationFiled: January 17, 2020Publication date: July 8, 2021Inventors: Brian Okchon CHAE, Niraj NANDAN, Anthony Joseph LELL, Mihir MODY
-
Publication number: 20210209041Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed herein to enable data aggregation and pattern adaptation in hardware acceleration subsystems. In some examples, a hardware acceleration subsystem includes a first scheduler, a first hardware accelerator coupled to the first scheduler to process at least a first data element and a second data element, and a first load store engine coupled to the first hardware accelerator, the first load store engine configured to communicate with the first scheduler at a superblock level by sending a done signal to the first scheduler in response to determining that a block count is equal to a first BPR value and aggregate the first data element and the second data element based on the first BPR value to generate a first aggregated data element.Type: ApplicationFiled: December 31, 2020Publication date: July 8, 2021Inventors: Niraj Nandan, Rajasekhar Reddy Allu, Brian Chae, Mihir Mody
-
Publication number: 20210136358Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed for image frame freeze detection. An example hardware accelerator includes a core logic circuit to generate second image data based on first image data associated with a first image frame, the second image data corresponding to at least one of processed image data, transformed image data, or one or more image data statistics, a load/store engine (LSE) coupled to the core logic circuit, the LSE to determine a first CRC value based on the second image data obtained from the core logic circuit, and a first interface coupled to a second interface, the second interface coupled to memory, the first interface to transmit the first CRC value obtained from the memory to a host device.Type: ApplicationFiled: October 30, 2019Publication date: May 6, 2021Inventors: Niraj Nandan, Brian Chae, Mihir Mody, Rajasekhar Reddy Allu
-
Publication number: 20210005005Abstract: A method and system for dynamically transferring graphical image processing operations from a graphical processing unit (GPU) to a digital signal processor (DSP). The method includes estimating the number of operations needed for the processing a set of image data; determining the operational limits of a GPU and compare with estimated number of operations and if the operational limits are exceeded; transfer the processing operations to the DSP from the GPU. The transfer can include transferring a portion of executable code for performing the processing operations, and generating a replacement code for the GPU. The DSP can then process a portion of the image data before sending it to the GPU for further processing.Type: ApplicationFiled: September 22, 2020Publication date: January 7, 2021Inventors: Mihir Mody, Hemant Hariyani, Anand Balagopalakrishnan, Jason Jones, Ajay Jayaraj, Manoj Koul
-
Publication number: 20200379928Abstract: A processing accelerator includes a shared memory, and a stream accelerator, a memory-to-memory accelerator, and a common DMA controller coupled to the shared memory. The stream accelerator is configured to process a real-time data stream, and to store stream accelerator output data generated by processing the real-time data stream in the shared memory. The memory-to-memory accelerator is configured to retrieve input data from the shared memory, to process the input data, and to store, in the shared memory, memory-to-memory accelerator output data generated by processing the input data. The common DMA controller is configured to retrieve stream accelerator output data from the shared memory and transfer the stream accelerator output data to memory external to the processing accelerator; and to retrieve the memory-to-memory accelerator output data from the shared memory and transfer the memory-to-memory accelerator output data to memory external to the processing accelerator.Type: ApplicationFiled: August 17, 2020Publication date: December 3, 2020Inventors: Mihir MODY, Niraj NANDAN, Hetul SANGHVI, Brian CHAE, Rajasekhar Reddy ALLU, Jason A.T. JONES, Anthony LELL, Anish REGHUNATH
-
Publication number: 20200363210Abstract: Information communication circuitry, including a first integrated circuit for coupling to a second integrated circuit in a package on package configuration. The first integrated circuit comprises processing circuitry for communicating information bits, and the information bits comprise data bits and error correction bits, where the error correction bits are for indicating whether data bits are received correctly. The second integrated circuit comprises a memory for receiving and storing at least some of the information bits. The information communication circuitry also includes interfacing circuitry for selectively communicating, along a number of conductors, between the package on package configuration. In a first instance, the interfacing circuitry selectively communicates only data bits along the number of conductors.Type: ApplicationFiled: August 3, 2020Publication date: November 19, 2020Inventors: Rahul Gulati, Aishwarya Dubey, Nainala Vyagrheswarudu, Vasant Easwaran, Prashant Dinkar Karandikar, Mihir Mody
-
Patent number: 10818067Abstract: A method and system for dynamically transferring graphical image processing operations from a graphical processing unit (GPU) to a digital signal processor (DSP). The method includes estimating the number of operations needed for the processing a set of image data; determining the operational limits of a GPU and compare with estimated number of operations and if the operational limits are exceeded; transfer the processing operations to the DSP from the GPU. The transfer can include transferring a portion of executable code for performing the processing operations, and generating a replacement code for the GPU. The DSP can then process a portion of the image data before sending it to the GPU for further processing.Type: GrantFiled: May 31, 2019Date of Patent: October 27, 2020Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Mihir Mody, Hemant Hariyani, Anand Balagopalakrishnan, Jason Jones, Ajay Jayaraj, Manoj Koul
-
Patent number: 10767998Abstract: Information communication circuitry, including a first integrated circuit for coupling to a second integrated circuit in a package on package configuration. The first integrated circuit comprises processing circuitry for communicating information bits, and the information bits comprise data bits and error correction bits, where the error correction bits are for indicating whether data bits are received correctly. The second integrated circuit comprises a memory for receiving and storing at least some of the information bits. The information communication circuitry also includes interfacing circuitry for selectively communicating, along a number of conductors, between the package on package configuration. In a first instance, the interfacing circuitry selectively communicates only data bits along the number of conductors.Type: GrantFiled: August 28, 2018Date of Patent: September 8, 2020Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Rahul Gulati, Aishwarya Dubey, Nainala Vyagrheswarudu, Vasant Easwaran, Prashant Dinkar Karandikar, Mihir Mody