Patents by Inventor Krishnakumar Nair

Krishnakumar Nair has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Apparatuses and methods to accelerate matrix multiplication

Patent number: 12254061

Abstract: Methods and apparatuses relating to performing vector multiplication are described. Hardware accelerators to perform vector multiplication are also described.

Type: Grant

Filed: September 27, 2018

Date of Patent: March 18, 2025

Assignee: Intel Corporation

Inventors: Maciej Urbanski, Brian J. Hickmann, Michael Rotzin, Krishnakumar Nair, Andrew Yang, Brian S. Morris, Dennis Bradford
Artificial neural network training using flexible floating point tensors

Patent number: 12205035

Abstract: Thus, the present disclosure is directed to systems and methods for training neural networks using a tensor that includes a plurality of FP16 values and a plurality of bits that define an exponent shared by some or all of the FP16 values included in the tensor. The FP16 values may include IEEE 754 format 16-bit floating point values and the tensor may include a plurality of bits defining the shared exponent. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa and a variable bit-length exponent that may be dynamically set by processor circuitry. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa; a variable bit-length exponent that may be dynamically set by processor circuitry; and a shared exponent switch set by the processor circuitry to selectively combine the FP16 value exponent with the shared exponent.

Type: Grant

Filed: June 8, 2018

Date of Patent: January 21, 2025

Assignee: Intel Corporation

Inventors: Krishnakumar Nair, Andrew Yang, Brian Morris
ARTIFICIAL NEURAL NETWORK TRAINING USING FLEXIBLE FLOATING POINT TENSORS

Publication number: 20240028905

Abstract: Thus, the present disclosure is directed to systems and methods for training neural networks using a tensor that includes a plurality of FP16 values and a plurality of bits that define an exponent shared by some or all of the FP16 values included in the tensor. The FP16 values may include IEEE 754 format 16-bit floating point values and the tensor may include a plurality of bits defining the shared exponent. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa and a variable bit-length exponent that may be dynamically set by processor circuitry. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa; a variable bit-length exponent that may be dynamically set by processor circuitry; and a shared exponent switch set by the processor circuitry to selectively combine the FP16 value exponent with the shared exponent.

Type: Application

Filed: September 29, 2023

Publication date: January 25, 2024

Inventors: Krishnakumar Nair, Andrew Yang, Brian Morris
POINT TO POINT CONNECTED PROCESSING ELEMENTS WITH DATA JOINER COMPONENTS

Publication number: 20230252263

Abstract: A system comprises a first processing element, a second processing element, a point-to-point connection between the first processing element and the second processing element, and a communication bus connecting together at least the first processing element and the second processing element. The first processing element includes a first matrix computing unit and the second processing element includes a second matrix computing unit. The point-to-point connection is configured to provide at least a result of the first processing element to a data joiner component of the second processing element configured to join at least the provided result of the first processing element with a result of the second matrix computing unit.

Type: Application

Filed: April 13, 2023

Publication date: August 10, 2023

Inventors: Krishnakumar Nair, Dheevatsa Mudigere, Abdulkadir Utku Diril
Three-dimensional convolution pipeline with memory organizer unit

Patent number: 11688032

Abstract: A processor system comprises a memory organizer unit and a matrix computing unit. The memory organizer unit is configured to receive a request for a three-dimensional data of a convolutional neural network layer. The requested three-dimensional data is obtained from a memory. The obtained three-dimensional data is rearranged in an optimized linear order and the rearranged data in the optimized linear order is provided to the matrix computing unit. The matrix computing unit is configured to perform at least a portion of a three-dimensional convolution using at least a portion of the provided rearranged data in the optimized linear order.

Type: Grant

Filed: August 16, 2019

Date of Patent: June 27, 2023

Assignee: Meta Platforms, Inc.

Inventors: Dheevatsa Mudigere, Krishnakumar Nair, Abdulkadir Utku Diril
Point to point connected processing elements with data joiner components

Patent number: 11657252

Abstract: A microprocessor system comprises a first processing element, a second processing element, a point-to-point connection between the first processing element and the second processing element, and a communication bus connecting together at least the first processing element and the second processing element. The first processing element includes a first matrix computing unit and the second processing element includes a second matrix computing unit. The point-to-point connection is configured to provide at least a result of the first processing element to a data joiner component of the second processing element configured to join at least the provided result of the first processing element with a result of the second matrix computing unit.

Type: Grant

Filed: June 7, 2019

Date of Patent: May 23, 2023

Assignee: Meta Platforms, Inc.

Inventors: Krishnakumar Nair, Dheevatsa Mudigere, Abdulkadir Utku Diril
Mechanism to perform non-linear functions in a machine learning accelerator

Patent number: 11640537

Abstract: An apparatus to facilitate execution of non-linear functions operations is disclosed. The apparatus comprises accelerator circuitry including a compute grid having a plurality of processing elements to execute neural network computations, store values resulting from the neural network computations, and perform piecewise linear (PWL) approximations of one or more non-linear functions using the stored values as input data.

Type: Grant

Filed: April 8, 2019

Date of Patent: May 2, 2023

Assignee: Intel Corporation

Inventors: Bharat Daga, Krishnakumar Nair, Pradeep Janedula, Aravind Babu Srinivasan, Bijoy Pazhanimala, Ambili Vengallur
Mapping convolution to a matrix processor unit

Patent number: 11481471

Abstract: A system comprises a matrix processor unit that includes a first type of register, a group of a second type of registers, and a plurality of calculation units. The first type of register is configured to concurrently store values from different rows of a first matrix. At least a portion of the first type of register is logically divided into groups of elements, and each of the groups corresponds to a different row of the first matrix. Each of the second type of registers is configured to concurrently store values from a plurality of different rows of a second matrix. Each of the calculation units corresponds to one of the second type of registers and is configured to at least in part determine a corresponding element in a result matrix of convoluting the second matrix with the first matrix.

Type: Grant

Filed: August 16, 2019

Date of Patent: October 25, 2022

Assignee: Meta Platforms, Inc.

Inventors: Krishnakumar Nair, Abdulkadir Utku Diril, Dheevatsa Mudigere, Olivia Wu, Ehsan Khish Ardestani Zadeh, Yuchen Hao
Circuit and method for computing depthwise convolution

Patent number: 11138292

Abstract: An electronic circuit performs depthwise convolution of an input matrix with a kernel matrix to generate an output matrix. In each of a plurality of rounds of operations, a row of kernel matrix elements is selected for the round of operations, and applied to the input matrix to obtain an intermediate data array corresponding to the selected row of kernel elements. The electronic circuit includes a plurality of subcircuits operable in parallel to generate, in each operation, a set of intermediate data elements in the intermediate data array. Each subcircuit generates a respective intermediate data element that is the sum of a respective row of the input matrix elements weighted by a set of weight elements including the selected row of kernel elements and at least one zero element. The selected row of kernel elements is successively shifted among the set of weight elements in the round of operations.

Type: Grant

Filed: May 16, 2019

Date of Patent: October 5, 2021

Assignee: FACEBOOK, INC.

Inventors: Krishnakumar Nair, Abdulkadir Utku Diril, Dheevatsa Mudigere, Ehsan Khish Ardestani Zadeh, Olivia Wu, Yuchen Hao
Circuit and method for calculating non-linear functions of floating-point numbers

Patent number: 11106430

Abstract: A circuit and method for calculating a non-linear function of floating-point numbers using hierarchical look-up tables are provided. The look-up tables are programmable to hold non-linear ranges of values for any of a variety of non-linear functions. The circuit includes computation modules in respective stages of a high-throughput computation pipeline. A first computation module in a first stage receives one or more floating-point numbers and, for each floating-point number, selects a first entry from a first look-up table based on the floating-point number. The first computation module then calculates and outputs a table index and a variable based on the first floating-point number and the first entry. The second compute module in a second stage, selects a second entry from a second look-up table based on the table index, and calculates and outputs an approximate value for the non-linear function using the variable and the second entry.

Type: Grant

Filed: May 16, 2019

Date of Patent: August 31, 2021

Assignee: FACEBOOK, INC.

Inventors: Anup Ramesh Kadkol, Krishnakumar Nair
APPARATUSES AND METHODS TO ACCELERATE MATRIX MULTIPLICATION

Publication number: 20210263993

Abstract: Methods and apparatuses relating to performing vector multiplication are described. Hardware accelerators to perform vector multiplication are also described.

Type: Application

Filed: September 27, 2018

Publication date: August 26, 2021

Inventors: Maciej URBANSKI, Brian J. HICKMANN, Michael ROTZIN, Krishnakumar NAIR, Andrew YANG, Brian S. MORRIS, Dennis BRADFORD
MAPPING CONVOLUTION TO A MATRIX PROCESSOR UNIT

Publication number: 20210049229

Abstract: A system comprises a matrix processor unit that includes a first type of register, a group of a second type of registers, and a plurality of calculation units. The first type of register is configured to concurrently store values from different rows of a first matrix. At least a portion of the first type of register is logically divided into groups of elements, and each of the groups corresponds to a different row of the first matrix. Each of the second type of registers is configured to concurrently store values from a plurality of different rows of a second matrix. Each of the calculation units corresponds to one of the second type of registers and is configured to at least in part determine a corresponding element in a result matrix of convoluting the second matrix with the first matrix.

Type: Application

Filed: August 16, 2019

Publication date: February 18, 2021

Inventors: Krishnakumar Nair, Abdulkadir Utku Diril, Dheevatsa Mudigere, Olivia Wu, Ehsan Khish Ardestani Zadeh, Yuchen Hao
THREE-DIMENSIONAL CONVOLUTION PIPELINE WITH MEMORY ORGANIZER UNIT

Publication number: 20210049426

Abstract: A processor system comprises a memory organizer unit and a matrix computing unit. The memory organizer unit is configured to receive a request for a three-dimensional data of a convolutional neural network layer. The requested three-dimensional data is obtained from a memory. The obtained three-dimensional data is rearranged in an optimized linear order and the rearranged data in the optimized linear order is provided to the matrix computing unit. The matrix computing unit is configured to perform at least a portion of a three-dimensional convolution using at least a portion of the provided rearranged data in the optimized linear order.

Type: Application

Filed: August 16, 2019

Publication date: February 18, 2021

Inventors: Dheevatsa Mudigere, Krishnakumar Nair, Abdulkadir Utku Diril
POINT TO POINT CONNECTED PROCESSING ELEMENTS WITH DATA JOINER COMPONENTS

Publication number: 20200387771

Abstract: A microprocessor system comprises a first processing element, a second processing element, a point-to-point connection between the first processing element and the second processing element, and a communication bus connecting together at least the first processing element and the second processing element. The first processing element includes a first matrix computing unit and the second processing element includes a second matrix computing unit. The point-to-point connection is configured to provide at least a result of the first processing element to a data joiner component of the second processing element configured to join at least the provided result of the first processing element with a result of the second matrix computing unit.

Type: Application

Filed: June 7, 2019

Publication date: December 10, 2020

Inventors: Krishnakumar Nair, Dheevatsa Mudigere, Abdulkadir Utku Diril
HIGH THROUGHPUT NEURAL NETWORK OPERATIONS USING INTER-LAYER MEMORY LAYOUT TRANSFORMATION

Publication number: 20200364047

Abstract: A microprocessor comprises a shared memory and a processing element. The processing element includes a matrix processor unit, a transpose hardware unit, a scatter hardware unit, and a gather hardware unit. The matrix processor unit is configured to perform a matrix operation. The transpose hardware unit is configured to perform a matrix transpose operation. The scatter hardware unit is configured to place data to the shared memory at locations selected for an output data layout conversion. The gather hardware unit is configured to obtain input data from the shared memory from non-contiguous locations for an input data layout conversion.

Type: Application

Filed: May 16, 2019

Publication date: November 19, 2020

Inventors: Ehsan Khish Ardestani Zadeh, Krishnakumar Nair, Abdulkadir Utku Diril, Dheevatsa Mudigere, Olivia Wu, Yuchen Hao
MECHANISM TO PERFORM NON-LINEAR FUNCTIONS IN A MACHINE LEARNING ACCELERATOR

Publication number: 20200320403

Abstract: An apparatus to facilitate execution of non-linear functions operations is disclosed. The apparatus comprises accelerator circuitry including a compute grid having a plurality of processing elements to execute neural network computations, store values resulting from the neural network computations, and perform piecewise linear (PWL) approximations of one or more non-linear functions using the stored values as input data.

Type: Application

Filed: April 8, 2019

Publication date: October 8, 2020

Applicant: Intel Corporation

Inventors: Bharat Daga, Krishnakumar Nair, Pradeep Janedula, Aravind Babu Srinivasan, Bijoy Pazhanimala, Ambili Vengallur
Apparatus and method for coherent, accelerated conversion between data representations

Patent number: 10761757

Abstract: An apparatus and method for a converting tensor data. For example, one embodiment of a method comprises: fetching source tensor blocks of a source tensor data structure, each source tensor block comprising a plurality of source tensor data elements having a first numeric representation, wherein the source tensor data structure comprises a predefined structural arrangement of source tensor blocks; converting the one or more source tensor blocks into one or more destination tensor blocks comprising a plurality of destination tensor data elements having a second numeric representation different from the first numeric representation, wherein the sets of one or more source tensor blocks are converted to one or more corresponding destination tensor blocks in a specified order based on the first and second numeric representations; and storing each individual destination tensor block in a designated memory region to maintain coherency with the predefined structural arrangement of the source tensor blocks.

Type: Grant

Filed: June 30, 2018

Date of Patent: September 1, 2020

Assignee: Intel Corporation

Inventors: Krishnakumar Nair, Andrew Yang, Michael Rotzin, Nitin Garegrat, Tom Schebye, Tony Werner
APPARATUS AND METHOD FOR COHERENT, ACCELERATED CONVERSION BETWEEN DATA REPRESENTATIONS

Publication number: 20190042094

Abstract: An apparatus and method for a converting tensor data. For example, one embodiment of a method comprises: fetching source tensor blocks of a source tensor data structure, each source tensor block comprising a plurality of source tensor data elements having a first numeric representation, wherein the source tensor data structure comprises a predefined structural arrangement of source tensor blocks; converting the one or more source tensor blocks into one or more destination tensor blocks comprising a plurality of destination tensor data elements having a second numeric representation different from the first numeric representation, wherein the sets of one or more source tensor blocks are converted to one or more corresponding destination tensor blocks in a specified order based on the first and second numeric representations; and storing each individual destination tensor block in a designated memory region to maintain coherency with the predefined structural arrangement of the source tensor blocks.

Type: Application

Filed: June 30, 2018

Publication date: February 7, 2019

Inventors: Krishnakumar Nair, Andrew Yang, Michael Rotzn, Nitin Garegrat, Tom Schebye, Tony Werner
ARTIFICIAL NEURAL NETWORK TRAINING USING FLEXIBLE FLOATING POINT TENSORS

Publication number: 20190042944

Abstract: Thus, the present disclosure is directed to systems and methods for training neural networks using a tensor that includes a plurality of FP16 values and a plurality of bits that define an exponent shared by some or all of the FP16 values included in the tensor. The FP16 values may include IEEE 754 format 16-bit floating point values and the tensor may include a plurality of bits defining the shared exponent. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa and a variable bit-length exponent that may be dynamically set by processor circuitry. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa; a variable bit-length exponent that may be dynamically set by processor circuitry; and a shared exponent switch set by the processor circuitry to selectively combine the FP16 value exponent with the shared exponent.

Type: Application

Filed: June 8, 2018

Publication date: February 7, 2019

Applicant: Intel Corporation

Inventors: Krishnakumar Nair, Andrew Yang, Brian Morris
Integrated digital network management platform

Patent number: 10079721

Abstract: A digital network assistant which can detect network anomalies, identify actions likely to remediate them, and assist the user in carrying out those actions. In particular, a digital network assistant constantly monitors data streams associated with the network to determine key performance indicators for the network. When these key performance indicators indicate a network anomaly, the digital network assistant associates it with a digital string to one or more actions likely to remediate similar network issues. The digital network assistant can take these actions automatically or present them to a user to be taken. The system can also aid the user in taking the required actions via an augmented reality interface. In addition, the system can create narratives embedding findings from data analysis eliminating subjectivity. The system can also find optimal parameter sets by continuously analyzing anomaly-free parts of the network and their key performance indicators.

Type: Grant

Filed: April 22, 2016

Date of Patent: September 18, 2018

Assignee: Netsights360

Inventors: Jithesh Kizhakkekkara Nair, Sunil Ponnangath Nair, Navin Babu Irimpan, Krishnakumar Nair

1 2 next