Patents by Inventor Natarajan Vaidhyanathan
Natarajan Vaidhyanathan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240095872Abstract: A processor-implemented method for a memory storage format to accelerate machine learning (ML) on a computing device is described. The method includes receiving an image in a first layer storage format of a neural network. The method also includes assigning addresses to image pixels of each of three channels of the first layer storage format for accessing the image pixels in a blocked ML storage acceleration format. The method further includes storing the image pixels in the blocked ML storage acceleration format according to the assigned addresses of the image pixels. The method also includes accelerating inference video processing of the image according to the assigned addresses for the image pixels corresponding to the blocked ML storage acceleration format.Type: ApplicationFiled: September 16, 2022Publication date: March 21, 2024Inventors: Colin Beaton VERRILLI, Natarajan VAIDHYANATHAN, Matthew SIMPSON, Geoffrey Carlton BERRY, Sandeep PANDE
-
Publication number: 20230223954Abstract: Techniques and apparatuses to decompress data that has been stack compressed is described. Stack compression refers to compression of data in one or more dimensions. For uncompressed data blocks that are very sparse, i.e., data blocks that contain many zeros, stack compression can be effective. In stack compression, uncompressed data block is compressed into compressed data block by removing one or more zero words from the uncompressed data block. A map metadata that maps the zero words of the uncompressed data block is generated during compression. With the use of the map metadata, the compressed data block can be decompressed to restore the uncompressed data block.Type: ApplicationFiled: May 7, 2021Publication date: July 13, 2023Inventors: Colin Beaton VERRILLI, Natarajan VAIDHYANATHAN
-
Publication number: 20230078079Abstract: A compute-in-memory array is provided that implements a filter for a layer in a neural network. The filter multiplies a plurality of activation bits by a plurality of filter weight bits for each channel in a plurality of channels through a charge accumulation from a plurality of capacitors. The accumulated charge is digitized to provide the output of the filter.Type: ApplicationFiled: September 10, 2021Publication date: March 16, 2023Inventors: Francois Ibrahim ATALLAH, Hoan Huu NGUYEN, Colin Beaton VERRILLI, Natarajan VAIDHYANATHAN
-
Patent number: 11362672Abstract: Stack compression refers to compression of data in one or more dimensions. For uncompressed data blocks that are very sparse, i.e., data blocks that contain many zeros, stack compression can be effective. In stack compression, uncompressed data block is compressed into compressed data block by removing one or more zero words from the uncompressed data block. A map metadata that maps the zero words of the uncompressed data block is generated during compression. With the use of the map metadata, the compressed data block can be decompressed to restore the uncompressed data block.Type: GrantFiled: May 8, 2020Date of Patent: June 14, 2022Assignee: Qualcomm IncorporatedInventors: Colin Beaton Verrilli, Natarajan Vaidhyanathan
-
Publication number: 20210351789Abstract: Techniques and apparatuses to decompress data that has been stack compressed is described. Stack compression refers to compression of data in one or more dimensions. For uncompressed data blocks that are very sparse, i.e., data blocks that contain many zeros, stack compression can be effective. In stack compression, uncompressed data block is compressed into compressed data block by removing one or more zero words from the uncompressed data block. A map metadata that maps the zero words of the uncompressed data block is generated during compression. With the use of the map metadata, the compressed data block can be decompressed to restore the uncompressed data block.Type: ApplicationFiled: May 8, 2020Publication date: November 11, 2021Inventors: Colin Beaton VERRILLI, Natarajan VAIDHYANATHAN
-
Patent number: 11144368Abstract: Providing self-resetting multi-producer multi-consumer semaphores in distributed processor-based systems is disclosed. In one aspect, a synchronization management circuit provides a semaphore including a counting semaphore value indicator, a current wait count indicator, and a target wait count indicator. When a consumer completes a wait operation, the synchronization management circuit adjusts the value of the current wait count indicator towards the value of the target wait count indicator, and compares the value of the current wait count indicator to the value of the target wait count indicator. If the value of the current wait count indicator has reached the value of the target wait count indicator, the synchronization management circuit infers that all consumers have observed the semaphore, and accordingly resets both the counting semaphore value indicator and the current wait count indicator to an initial wait value to place the semaphore in its initial state for reuse.Type: GrantFiled: June 18, 2019Date of Patent: October 12, 2021Assignee: Qualcomm IncorproatedInventors: Colin Beaton Verrilli, Natarajan Vaidhyanathan
-
Patent number: 11010313Abstract: A method, apparatus, and system for an architecture for machine learning acceleration is presented. An apparatus includes a plurality of processing elements, each including a tightly-coupled memory, and a memory system coupled to the processing elements. A global synchronization manager is coupled to the plurality of the processing elements and to the memory system. The processing elements do not implement a coherency protocol with respect to the memory system. The processing elements implement direct memory access with respect to the memory system, and the global synchronization manager is configured to synchronize operations of the plurality of processing elements through the TCMs.Type: GrantFiled: August 29, 2019Date of Patent: May 18, 2021Assignee: Qualcomm IncorporatedInventors: Colin Beaton Verrilli, Natarajan Vaidhyanathan, Rexford Alan Hill
-
Patent number: 10936943Abstract: Providing flexible matrix processors for performing neural network convolution in matrix-processor-based devices is disclosed. In this regard, a matrix-processor-based device provides a central processing unit (CPU) and a matrix processor. The matrix processor reorganizes a plurality of weight matrices and a plurality of input matrices into swizzled weight matrices and swizzled input matrices, respectively, that have regular dimensions natively supported by the matrix processor. The matrix-processor-based device then performs a convolution operation using the matrix processor to perform matrix multiplication/accumulation operations for the regular dimensions of the weight matrices and the input matrices, and further uses the CPU to execute instructions for handling the irregular dimensions of the weight matrices and the input matrices (e.g., by executing a series of nested loops, as a non-limiting example).Type: GrantFiled: August 30, 2018Date of Patent: March 2, 2021Assignee: Qualcomm IncorporatedInventors: Colin Beaton Verrilli, Mattheus Cornelis Antonius Adrianus Heddes, Natarajan Vaidhyanathan, Koustav Bhattacharya, Robert Dreyer
-
Patent number: 10877951Abstract: Techniques are disclosed for notifying network control software of new and moved source MAC addresses. In one embodiment, a switch may redirect a packet sent by a new or migrated virtual machine to the network control software as a notification. The switch does not forward the packet, thereby protecting against denial of service attacks. The switch further adds to a forwarding database a temporary entry which includes a “No_Redirect” flag for a new source MAC address, or updates an existing entry for a source MAC address that hits in the forwarding database by setting the “No_Redirect” flag. The “No_Redirect” flag indicates whether a notification has already been sent to the network control software for this source MAC address. The switch may periodically retry the notification to the network control software, until the network control software validates the source MAC address, depending on whether the “No_Redirect” is set.Type: GrantFiled: January 22, 2014Date of Patent: December 29, 2020Assignee: International Business Machines CorporationInventors: Claude Basso, Josep Cors, Venkatesh K. Janakiraman, Sze-Wa Lao, Sameer M. Shah, David A. Shedivy, Ethan M. Spiegel, Natarajan Vaidhyanathan, Colin B. Verrilli
-
Publication number: 20200401461Abstract: Providing self-resetting multi-producer multi-consumer semaphores in distributed processor-based systems is disclosed. In one aspect, a synchronization management circuit provides a semaphore including a counting semaphore value indicator, a current wait count indicator, and a target wait count indicator. When a consumer completes a wait operation, the synchronization management circuit adjusts the value of the current wait count indicator towards the value of the target wait count indicator, and compares the value of the current wait count indicator to the value of the target wait count indicator. If the value of the current wait count indicator has reached the value of the target wait count indicator, the synchronization management circuit infers that all consumers have observed the semaphore, and accordingly resets both the counting semaphore value indicator and the current wait count indicator to an initial wait value to place the semaphore in its initial state for reuse.Type: ApplicationFiled: June 18, 2019Publication date: December 24, 2020Inventors: Colin Beaton Verrilli, Natarajan Vaidhyanathan
-
Patent number: 10838862Abstract: Aspects disclosed herein include memory controllers employing memory capacity compression, and related processor-based systems and methods. In certain aspects, compressed memory controllers are employed that can provide memory capacity compression. In some aspects, a line-based memory capacity compression scheme can be employed where additional translation of a physical address (PA) to a physical buffer address is performed to allow compressed data in a system memory at the physical buffer address for efficient compressed data storage. A translation lookaside buffer (TLB) may also be employed to store TLB entries comprising PA tags corresponding to a physical buffer address in the system memory to more efficiently perform the translation of the PA to the physical buffer address in the system memory. In certain aspects, a line-based memory capacity compression scheme, a page-based memory capacity compression scheme, or a hybrid line-page-based memory capacity compression scheme can be employed.Type: GrantFiled: May 19, 2015Date of Patent: November 17, 2020Assignee: QUALCOMM IncorporatedInventors: Mattheus Cornelis Antonius Adrianus Heddes, Natarajan Vaidhyanathan, Colin Beaton Verrilli
-
Patent number: 10838942Abstract: Techniques are disclosed for notifying network control software of new and moved source MAC addresses. In one embodiment, a switch may redirect a packet sent by a new or migrated virtual machine to the network control software as a notification. The switch does not forward the packet, thereby protecting against denial of service attacks. The switch further adds to a forwarding database a temporary entry which includes a “No_Redirect” flag for a new source MAC address, or updates an existing entry for a source MAC address that hits in the forwarding database by setting the “No_Redirect” flag. The “No_Redirect” flag indicates whether a notification has already been sent to the network control software for this source MAC address. The switch may periodically retry the notification to the network control software, until the network control software validates the source MAC address, depending on whether the “No_Redirect” is set.Type: GrantFiled: February 11, 2014Date of Patent: November 17, 2020Assignee: International Business Machines CorporationInventors: Claude Basso, Josep Cors, Venkatesh K. Janakiraman, Sze-Wa Lao, Sameer M. Shah, David A. Shedivy, Ethan M. Spiegel, Natarajan Vaidhyanathan, Colin B. Verrilli
-
Patent number: 10747501Abstract: Providing efficient floating-point operations using matrix processors in processor-based systems is disclosed. In this regard, a matrix-processor-based device provides a matrix processor comprising a positive partial sum accumulator and a negative partial sum accumulator. As the matrix processor processes pairs of floating-point operands, the matrix processor calculates an intermediate product based on a first floating-point operand and a second floating-point operand and determines a sign of the intermediate product. Based on the sign, the matrix processor normalizes the intermediate product with a partial sum fraction of the positive partial sum accumulator or the negative partial sum accumulator, then adds the intermediate product to the positive sum accumulator or the negative sum accumulator.Type: GrantFiled: August 30, 2018Date of Patent: August 18, 2020Assignee: Qualcomm IncorporatedInventors: Mattheus Cornelis Antonius Adrianus Heddes, Natarajan Vaidhyanathan, Robert Dreyer, Colin Beaton Verrilli, Koustav Bhattacharya
-
Patent number: 10725740Abstract: Providing efficient multiplication of sparse matrices in matrix-processor-based devices is disclosed herein. In one aspect, a matrix processor of a matrix-processor-based device includes a plurality of sequencers coupled to a plurality of multiply/accumulate (MAC) units for performing multiplication and accumulation operations. Each sequencer determines whether a product of an element of a first input matrix to be multiplied with an element of a second input matrix has a value of zero (e.g., by determining whether the element of the first input matrix has a value of zero, or by determining whether either the element of the first input matrix or that of the second input matrix has a value of zero). If the product of the elements of the first input matrix and the second input matrix does not have a value of zero, the sequencer provides the elements to a MAC unit to perform a multiplication and accumulation operation.Type: GrantFiled: August 30, 2018Date of Patent: July 28, 2020Assignee: Qualcomm IncorporatedInventors: Mattheus Cornelis Antonius Adrianus Heddes, Robert Dreyer, Colin Beaton Verrilli, Natarajan Vaidhyanathan, Koustav Bhattacharya
-
Publication number: 20200073830Abstract: A method, apparatus, and system for an architecture for machine learning acceleration is presented. An apparatus includes a plurality of processing elements, each including a tightly-coupled memory, and a memory system coupled to the processing elements. A global synchronization manager is coupled to the plurality of the processing elements and to the memory system. The processing elements do not implement a coherency protocol with respect to the memory system. The processing elements implement direct memory access with respect to the memory system, and the global synchronization manager is configured to synchronize operations of the plurality of processing elements through the TCMs.Type: ApplicationFiled: August 29, 2019Publication date: March 5, 2020Inventors: Colin Beaton VERRILLI, Natarajan VAIDHYANATHAN, Rexford Alan HILL
-
Patent number: 10541921Abstract: Embodiments provide a TCAM-based access control list that supports disjunction operations in rules. A network frame is received. Embodiments determine set TCP flags of the network frame. Upon determining that the set TCP flags match a first entry in a numeric range table, bits of a search key corresponding to the first entry are updated. The search key accesses a second entry stored in a TCAM. The first entry further comprises an encode field to scan a TCP header of the network frame for set TCP flags, a first mask field to a condition corresponding to unset TCP flags to identify in the network frame, a second mask field to a condition corresponding to set TCP flags to identify in the network frame, and an operation field specifying a disjunction operation for comparing the set TCP flags with the first mask field and the second mask field.Type: GrantFiled: November 21, 2017Date of Patent: January 21, 2020Assignee: International Business Machines CorporationInventors: Claude Basso, Joseph A. Kirscht, Natarajan Vaidhyanathan
-
Patent number: 10503661Abstract: Providing memory bandwidth compression using compressed memory controllers (CMCs) in a central processing unit (CPU)-based system is disclosed. In this regard, in some aspects, a CMC is configured to receive a memory read request to a physical address in a system memory, and read a compression indicator (CI) for the physical address from a master directory and/or from error correcting code (ECC) bits of the physical address. Based on the CI, the CMC determines a number of memory blocks to be read for the memory read request, and reads the determined number of memory blocks. In some aspects, a CMC is configured to receive a memory write request to a physical address in the system memory, and generate a CI for write data based on a compression pattern of the write data. The CMC updates the master directory and/or the ECC bits of the physical address with the generated CI.Type: GrantFiled: May 20, 2015Date of Patent: December 10, 2019Assignee: QUALCOMM IncorporatedInventors: Mattheus Cornelis Antonius Adrianus Heddes, Natarajan Vaidhyanathan, Colin Beaton Verrilli
-
Patent number: 10467092Abstract: Providing space-efficient storage for dynamic random access memory (DRAM) cache tags is provided. In one aspect, a DRAM cache management circuit provides a plurality of cache entries, each of which contains a tag storage region, a data storage region, and an error protection region. The DRAM cache management circuit is configured to store data to be cached in the data storage region of each cache entry. The DRAM cache management circuit is also configured to use an error detection code (EDC) instead of an error correcting code (ECC), and to store a tag and the EDC for each cache entry in the error protection region of the cache entry. In this manner, the capacity of a DRAM cache can be increased by avoiding the need for the tag storage region for each cache entry, while still providing error detection for the cache entry.Type: GrantFiled: March 30, 2016Date of Patent: November 5, 2019Assignee: QUALCOMM IncorporatedInventors: Natarajan Vaidhyanathan, Mattheus Cornelis Antonius Adrianus Heddes, Colin Beaton Verrilli
-
Patent number: 10419267Abstract: Techniques are disclosed for notifying network control software of new and moved source MAC addresses. In one embodiment, a switch detects packets sent by a new or migrated virtual machine, and sends a copy of a detected packet to the network control software as a notification. The switch further learns the source MAC address, thereby permitting the entry to be used for normal forwarding prior to validation of the entry and the VM associated therewith by the network control software. Until the network control software has validated the VM, the switch may periodically retry the notification to the network control software. “No_Redirect” and “Not_Validated” flags may be used to indicate whether a notification has already been attempted and thus no retry is necessary, and that the VM associated with the VM has not yet been validated, respectively.Type: GrantFiled: January 22, 2014Date of Patent: September 17, 2019Assignee: LENOVO ENTERPRISE SOLUTIONS (SINGAPORE) PTE. LTD.Inventors: Claude Basso, Josep Cors, Venkatesh K. Janakiraman, Sze-Wa Lao, Sameer M. Shah, David A. Shedivy, Ethan M. Spiegel, Natarajan Vaidhyanathan, Colin B. Verrilli
-
Patent number: 10236917Abstract: Providing memory bandwidth compression in chipkill-correct memory architectures is disclosed. In this regard, a compressed memory controller (CMC) introduces a specified error pattern into chipkill-correct error correcting code (ECC) bits to indicate compressed data. To encode data, the CMC applies a compression algorithm to an uncompressed data block to generate a compressed data block. The CMC then generates ECC data for the compressed data block (i.e., an “inner” ECC segment), appends the inner ECC segment to the compressed data block, and generates ECC data for the compressed data block and the inner ECC segment (i.e., an “outer” ECC segment). The CMC then intentionally inverts a specified plurality of bytes of the outer ECC segment (e.g., in portions of the outer ECC segment stored in different physical memory chips by a chipkill-correct ECC mechanism). The outer ECC segment is then appended to the compressed data block and the inner ECC segment.Type: GrantFiled: September 15, 2016Date of Patent: March 19, 2019Assignee: QUALCOMM IncorporatedInventors: Natarajan Vaidhyanathan, Luther James Blackwood, Mattheus Cornelis Antonius Adrianus Heddes, Michael Raymond Trombley, Colin Beaton Verrilli