Patents by Inventor Krishnakumar Narayanan

Krishnakumar Narayanan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

FLEXIBLE MATRIX PROCESSING

Publication number: 20240095304

Abstract: A system includes a matrix transpose component, a matrix processing component, a data modification component, and a data reduction component. The matrix transpose component is configured to transpose a stored matrix to an output matrix. The matrix processing component is configured to multiply the output matrix with a mask vector to determine a result vector. The data modification component is configured to modify at least a portion of the result vector to determine a modified vector. The data reduction component is configured to sum at least a portion of elements included in the modified vector.

Type: Application

Filed: October 23, 2023

Publication date: March 21, 2024

Inventors: Krishnakumar Narayanan Nair, Thomas Mark Ulrich, Ehsan Khish Ardestani Zadeh
Device and method for flexibly summing matrix values

Patent number: 11829441

Abstract: A device includes a matrix transpose component, a matrix processing component, a data alignment component, and a data reduction component. The matrix transpose component is configured to transpose an input matrix of elements to output an output matrix of the elements that have been transposed. The matrix processing component is configured to multiply a first multiplication input matrix with a second multiplication input matrix, wherein the output matrix of the matrix transpose component is utilized as the first multiplication input matrix and a mask vector is utilized as the second multiplication input matrix. The data alignment component is configured to modify at least a portion of elements of a result of the matrix processing component. The data reduction component is configured to sum at least the elements of the modified result of the matrix processing component to determine a sum of the group of values.

Type: Grant

Filed: June 7, 2022

Date of Patent: November 28, 2023

Assignee: Meta Platforms, Inc.

Inventors: Krishnakumar Narayanan Nair, Thomas Mark Ulrich, Ehsan Khish Ardestani Zadeh
Systems and methods for reducing power consumption of convolution operations for artificial neural networks

Patent number: 11763131

Abstract: A computer-implemented method may include retrieving, via a remote data bus from a data store remote from a hardware accelerator to a local memory device (LMD) included in the hardware accelerator, (1) a filter matrix comprising a set of filter vectors corresponding to a filter location included in each of a set of filters of a convolutional layer of an artificial neural network (ANN), and (2) an activation matrix comprising a primary and a secondary set of activation vectors, each activation vector included in an activation volume inputted into the convolutional layer. The method may also include directing a hardware matrix multiplication unit (MMU) included in the hardware accelerator and communicatively coupled to the LMD via a local data bus, to execute a matrix multiplication operation (MMO) using the filter matrix and the activation matrix.

Type: Grant

Filed: August 6, 2021

Date of Patent: September 19, 2023

Assignee: Meta Platforms, Inc.

Inventor: Krishnakumar Narayanan Nair
HIGH BANDWIDTH MEMORY SYSTEM WITH DYNAMICALLY PROGRAMMABLE DISTRIBUTION SCHEME

Publication number: 20230251903

Abstract: A system comprises a processor coupled to a plurality of memory units. Each of the plurality of memory units includes a request processing unit and a plurality of memory banks. The processor includes a plurality of processing elements and a communication network communicatively connecting the plurality of processing elements to the plurality of memory units. At least a first processing element of the plurality of processing elements includes a control logic unit and a matrix compute engine. The control logic unit is configured to access data from the plurality of memory units using a dynamically programmable distribution scheme.

Type: Application

Filed: April 20, 2023

Publication date: August 10, 2023

Inventors: Abdulkadir Utku Diril, Olivia Wu, Krishnakumar Narayanan Nair, Anup Ramesh Kadkol, Aravind Kalaiah, Pankaj Kansal
Systems and methods for reducing data movement during convolution operations in artificial neural networks

Patent number: 11699081

Abstract: The disclosed computer-implemented method may include (1) receiving, at a hardware accelerator that supports an ANN, an activation data set that is to undergo a convolution operation via a filter kernel of the ANN, (2) receiving, at the hardware accelerator, an argument indicating that the filter kernel exceeds at least one boundary of the activation data set when slid across a certain position during the convolution operation, (3) determining, based at least in part on the argument, that the hardware accelerator is to generate padding data at the boundary of the activation data set in connection with the certain position of the filter kernel, and then (4) performing, at the hardware accelerator, the convolution operation by processing a portion of the activation data set and the padding data when the filter kernel slides across the certain position. Various other systems and methods are also disclosed.

Type: Grant

Filed: December 20, 2019

Date of Patent: July 11, 2023

Assignee: Meta Platforms, Inc.

Inventors: Ehsan Khish Ardestani Zadeh, Martin Schatz, Krishnakumar Narayanan Nair, Yuchen Hao, Abdulkadir Utku Diril, Rakesh Komuravelli
High bandwidth memory system with dynamically programmable distribution scheme

Patent number: 11663043

Abstract: A system comprises a processor coupled to a plurality of memory units. Each of the plurality of memory units includes a request processing unit and a plurality of memory banks. The processor includes a plurality of processing elements and a communication network communicatively connecting the plurality of processing elements to the plurality of memory units. At least a first processing element of the plurality of processing elements includes a control logic unit and a matrix compute engine. The control logic unit is configured to access data from the plurality of memory units using a dynamically programmable distribution scheme.

Type: Grant

Filed: December 2, 2019

Date of Patent: May 30, 2023

Assignee: Meta Platforms, Inc.

Inventors: Abdulkadir Utku Diril, Olivia Wu, Krishnakumar Narayanan Nair, Anup Ramesh Kadkol, Aravind Kalaiah, Pankaj Kansal
Systems and methods for reducing power consumption of convolution operations of artificial neural networks

Patent number: 11599181

Abstract: A computer-implemented method may include (1) maintaining (a) a filter matrix in a filter cache included in a local memory device (LMD) included in a hardware accelerator, and (b) a plurality of activation matrices corresponding to different rows of an activation volume in an activation cache included in the LMD, (2) for each activation matrix, directing a matrix multiplication unit (MMU) included in the hardware accelerator to execute a matrix multiplication operation (MMU) using the filter matrix and the activation matrix, (3) loading an additional filter matrix into the filter cache, and (4) directing the MMU to execute a plurality of additional MMOs, each additional MMO using one filter matrix included in the filter cache and one activation matrix included in the activation cache, such that the MMU reuses the filter matrix for at least one additional MMO and uses the additional filter matrix for a different additional MMO.

Type: Grant

Filed: December 23, 2019

Date of Patent: March 7, 2023

Assignee: Meta Platforms, Inc.

Inventors: Krishnakumar Narayanan Nair, Abdulkadir Utku Diril, Yuchen Hao, Thomas Mark Ulrich, Rakesh Komuravelli, Ehsan Khish Ardestani Zadeh, Martin Schatz
USING A LOW-BIT-WIDTH DOT PRODUCT ENGINE TO SUM HIGH-BIT-WIDTH NUMBERS

Publication number: 20230056304

Abstract: A system includes a vector multiplier configured to multiply a first vector of integer elements with a second vector of integer elements to determine a resulting vector of integer elements, wherein integer elements of the first and second vectors of integer elements are represented using a first number of bits and an integer element of the first vector of integer elements represents a portion of a value of a group of values. The system further includes a vector adder configured to add together the integer elements of the resulting vector of integer elements to determine a summed result, a bit shifter configured to shift bits of the summed result leftward, and an accumulator configured to determine an accumulated output sum that includes the leftward-shifted summed result.

Type: Application

Filed: August 24, 2022

Publication date: February 23, 2023

Inventors: Thomas Mark Ulrich, Krishnakumar Narayanan Nair, Ehsan Khish Ardestani Zadeh
Grouped convolution using point-to-point connected channel convolution engines

Patent number: 11580192

Abstract: A processor system comprises a plurality of processing elements. Each processing element includes a corresponding convolution processor unit configured to perform a portion of a groupwise convolution. The corresponding convolution processor unit determines multiplication results by multiplying each data element of a portion of data elements in a convolution data matrix with a corresponding data element in a corresponding groupwise convolution weight matrix. The portion of data elements in the convolution data matrix that are multiplied belong to different channels and different groups. For each specific channel of the different channels, the corresponding convolution processor unit sums together at least some of the multiplication results belonging to the same specific channel to determine a corresponding channel convolution result data element.

Type: Grant

Filed: April 8, 2020

Date of Patent: February 14, 2023

Assignee: Meta Platforms, Inc.

Inventors: Rakesh Komuravelli, Krishnakumar Narayanan Nair, Abdulkadir Utku Diril, Ehsan Khish Ardestani Zadeh, Yuchen Hao, Martin Schatz, Thomas Mark Ulrich, Olivia Wu, Anup Ramesh Kadkol, Amin Firoozshahian
HIGH THROUGHPUT MATRIX PROCESSOR WITH SUPPORT FOR CONCURRENTLY PROCESSING MULTIPLE MATRICES

Publication number: 20230004624

Abstract: A system comprises a data input vector unit, a weight input vector unit, and a plurality of calculation units. The data input vector unit is configured to concurrently receive elements of different rows of a first and second data matrix. The weight input vector unit is configured to receive a combined weight vector and at least in part concurrently provide obtained weight elements of a first and second weight matrix to a corresponding first and second group of calculation units. At least one calculation unit of each group of the first and second group of calculation units is configured to multiply elements from the data input vector unit with corresponding elements of the corresponding weight matrix from the weight input vector unit and sum together multiplication results of the corresponding calculation unit to at least in part determine a corresponding element in a first or second convolution result matrix.

Type: Application

Filed: June 30, 2022

Publication date: January 5, 2023

Inventors: Krishnakumar Narayanan Nair, Olivia Wu, Ehsan Khish Ardestani Zadeh, Abdulkadir Utku Diril, Thomas Mark Ulrich, Yuchen Hao, Rakesh Komuravelli, Aravind Kalaiah
High bandwidth memory system with distributed request broadcasting masters

Patent number: 11537301

Abstract: A system comprises a processor and a plurality of memory units. The processor is coupled to each of the plurality of memory units by a plurality of network connections. The processor includes a plurality of processing elements arranged in a two-dimensional array and a corresponding two-dimensional communication network communicatively connecting each of the plurality of processing elements to other processing elements on same axes of the two-dimensional array. Each processing element that is located along a diagonal of the two-dimensional array is configured as a request broadcasting master for a respective group of processing elements located along a same axis of the two-dimensional array.

Type: Grant

Filed: May 4, 2021

Date of Patent: December 27, 2022

Assignee: Meta Platforms, Inc.

Inventors: Abdulkadir Utku Diril, Olivia Wu, Krishnakumar Narayanan Nair, Aravind Kalaiah, Anup Ramesh Kadkol, Pankaj Kansal
Mapping convolution to a channel convolution engine

Patent number: 11537865

Abstract: A processor system comprises a first and second group of registers and a hardware channel convolution processor unit. The first group of registers is configured to store data elements of channels of a portion of a convolution data matrix. Each register stores at least one data element from each channel. The second group of registers is configured to store data elements of convolution weight matrices including a separate convolution weight matrix for each channel. Each register stores at least one data element from each convolution weight matrix. The hardware channel convolution processor unit is configured to multiply each data element in the first group of registers with a corresponding data element in the second group of registers and sum together the multiplication results for each specific channel to determine corresponding channel convolution result data elements in a corresponding channel convolution result matrix.

Type: Grant

Filed: February 18, 2020

Date of Patent: December 27, 2022

Assignee: Meta Platforms, Inc.

Inventors: Krishnakumar Narayanan Nair, Rakesh Komuravelli, Abdulkadir Utku Diril, Ehsan Khish Ardestani Zadeh, Yuchen Hao, Martin Schatz, Thomas Mark Ulrich, Olivia Wu, Anup Ramesh Kadkol, Amin Firoozshahian
High bandwidth memory system with crossbar switch for dynamically programmable distribution scheme

Patent number: 11531619

Abstract: A system comprises a processor coupled to a plurality of memory units. Each of the plurality of memory units includes a request processing unit and a plurality of memory banks. Each request processing unit includes a plurality of decomposition units and a crossbar switch, the crossbar switch communicatively connecting each of the plurality of decomposition units to each of the plurality of memory banks. The processor includes a plurality of processing elements and a communication network communicatively connecting the plurality of processing elements to the plurality of memory units. At least a first processing element of the plurality of processing elements includes a control logic unit and a matrix compute engine. The control logic unit is configured to access the plurality of memory units using a dynamically programmable distribution scheme.

Type: Grant

Filed: December 17, 2019

Date of Patent: December 20, 2022

Assignee: Meta Platforms, Inc.

Inventors: Olivia Wu, Abdulkadir Utku Diril, Krishnakumar Narayanan Nair, Aravind Kalaiah, Anup Ramesh Kadkol, Pankaj Kansal
Mapping convolution to a partition channel convolution engine

Patent number: 11520853

Abstract: A processor system comprises two groups of registers and a hardware channel convolution processor unit. The first group of registers is configured to store data elements of channels of a portion of a convolution data matrix. Each register stores at least one data element from each channel. The second group of registers is configured to store data elements of convolution weight matrices including a separate matrix for each channel. Each register stores at least one data element from each matrix. The hardware channel convolution processor unit is configured to multiply each data element in a first and second portion of the first group of registers with a corresponding data element in the second group of registers to determine corresponding multiplication results and sum together the multiplication results for each specific channel to determine two corresponding channel convolution result data elements in a corresponding channel convolution result matrix.

Type: Grant

Filed: February 28, 2020

Date of Patent: December 6, 2022

Assignee: Meta Platforms, Inc.

Inventors: Krishnakumar Narayanan Nair, Rakesh Komuravelli, Abdulkadir Utku Diril, Ehsan Khish Ardestani Zadeh, Yuchen Hao, Martin Schatz, Thomas Mark Ulrich, Olivia Wu, Anup Ramesh Kadkol, Amin Firoozshahian
Support for different matrix multiplications by selecting adder tree intermediate results

Patent number: 11520854

Abstract: A first group of elements is element-wise multiplied with a second group of elements using a plurality of multipliers belonging to a matrix multiplication hardware unit. Results of the plurality of multipliers are added together using a hierarchical tree of adders belonging to the matrix multiplication hardware unit and a final result of the hierarchical tree of adders or any of a plurality of intermediate results of the hierarchical tree of adders is selectively provided for use in determining an output result matrix.

Type: Grant

Filed: October 29, 2019

Date of Patent: December 6, 2022

Assignee: Meta Platforms, Inc.

Inventors: Yuchen Hao, Krishnakumar Narayanan Nair, Ehsan Khish Ardestani Zadeh, Rakesh Komuravelli, Abdulkadir Utku Diril, Thomas Mark Ulrich
DEVICE AND METHOD FOR FLEXIBLY SUMMING MATRIX VALUES

Publication number: 20220374499

Abstract: A device includes a matrix transpose component, a matrix processing component, a data alignment component, and a data reduction component. The matrix transpose component is configured to transpose an input matrix of elements to output an output matrix of the elements that have been transposed. The matrix processing component is configured to multiply a first multiplication input matrix with a second multiplication input matrix, wherein the output matrix of the matrix transpose component is utilized as the first multiplication input matrix and a mask vector is utilized as the second multiplication input matrix. The data alignment component is configured to modify at least a portion of elements of a result of the matrix processing component. The data reduction component is configured to sum at least the elements of the modified result of the matrix processing component to determine a sum of the group of values.

Type: Application

Filed: June 7, 2022

Publication date: November 24, 2022

Inventors: Krishnakumar Narayanan Nair, Thomas Mark Ulrich, Ehsan Khish Ardestani Zadeh
MATRIX PROCESSING INSTRUCTION WITH OPTIONAL UP/DOWN SAMPLING OF MATRIX

Publication number: 20220365784

Abstract: A processor system comprises a shared memory and a processing element. The processing element includes a matrix processor unit and is in communication with the shared memory. The processing element is configured to receive a processor instruction specifying a data matrix and a matrix manipulation operation. A manipulation matrix based on the processor instruction is identified. The data matrix and the manipulation matrix are used to perform a matrix operation to determine a result matrix.

Type: Application

Filed: May 25, 2022

Publication date: November 17, 2022

Inventors: Thomas Mark Ulrich, Krishnakumar Narayanan Nair, Yuchen Hao
Systems and methods for handling padding regions in convolution operations

Patent number: 11501147

Abstract: A disclosed computer-implemented method may include maintaining, within a local memory device (LMD) included in a hardware accelerator (1) a filter matrix corresponding to a filter location included in each of a set of filters of a convolutional layer of an artificial neural network (ANN), and (2) a set of activation vectors corresponding to an active region of an activation volume input into the convolutional layer. The method may also include determining that the active region of the activation volume is contiguous with a padding region associated with at least a portion of the activation volume. The method may further include directing a matrix multiplication unit (MMU) included in the hardware accelerator to execute a matrix multiplication operation (MMO) using the filter matrix and an activation matrix that may include (1) the set of activation vectors, and (2) at least one padding vector corresponding to the padding region.

Type: Grant

Filed: January 30, 2020

Date of Patent: November 15, 2022

Assignee: Meta Platforms, Inc.

Inventors: Krishnakumar Narayanan Nair, Ehsan Khish Ardestani, Martin Schatz, Yuchen Hao, Abdulkadir Utku Diril, Rakesh Komuravelli
Using a low-bit-width dot product engine to sum high-bit-width numbers

Patent number: 11455143

Abstract: A device (e.g., an integrated circuit chip) includes a dot product processing component, a data alignment component, and an accumulator. The dot product processing component is configured to calculate a dot product of a first group of elements stored in a first storage unit with a second group of elements, wherein: each element of the first group of elements is represented using a first number of bits, each value of a group of values stored in the first storage unit is represented using a second number of bits greater than the first number of bits, and each value of the group of values is stored as split segments across more than one element of the elements of the first group of elements. The data alignment component is configured to receive results of the dot product processing component and modify one or more of the results of the dot product processing component. The accumulator is configured to sum outputs of the data alignment component to at least in part determine a sum of the group of values.

Type: Grant

Filed: May 7, 2020

Date of Patent: September 27, 2022

Assignee: Meta Platforms, Inc.

Inventors: Thomas Mark Ulrich, Krishnakumar Narayanan Nair, Ehsan Khish Ardestani Zadeh
Pipelined pointwise convolution using per-channel convolution operations

Patent number: 11443013

Abstract: A processor system comprises a hardware channel convolution processor unit and dot product processor unit. The channel convolution processor unit is configured to perform depthwise convolution, including by multiplying each data element of a first group of data elements of a convolution data matrix with a corresponding data element of a second group of data elements of a plurality of depthwise convolution weight matrices and summing together, for each specific channel, multiplication results corresponding to the specific channel to determine one corresponding result data element in a corresponding channel convolution result matrix to calculate a portion of depthwise convolution results.

Type: Grant

Filed: March 23, 2020

Date of Patent: September 13, 2022

Assignee: Meta Platforms, Inc.

Inventors: Rakesh Komuravelli, Krishnakumar Narayanan Nair, Abdulkadir Utku Diril, Ehsan Khish Ardestani Zadeh, Yuchen Hao, Martin Schatz, Thomas Mark Ulrich, Olivia Wu, Anup Ramesh Kadkol, Amin Firoozshahian

1 2 3 4 next