Patents by Inventor Naveen Mellempudi

Naveen Mellempudi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

INCREMENTAL PRECISION NETWORKS USING RESIDUAL INFERENCE AND FINE-GRAIN QUANTIZATION

Publication number: 20240160931

Abstract: One embodiment provides for a computer-readable medium storing instructions that cause one or more processors to perform operations comprising determining a per-layer scale factor to apply to tensor data associated with layers of a neural network model and converting the tensor data to converted tensor data. The tensor data may be converted from a floating point datatype to a second datatype that is an 8-bit datatype. The instructions further cause the one or more processors to generate an output tensor based on the converted tensor data and the per-layer scale factor.

Type: Application

Filed: December 7, 2023

Publication date: May 16, 2024

Applicant: Intel Corporation

Inventors: Abhisek KUNDU, NAVEEN MELLEMPUDI, DHEEVATSA MUDIGERE, Dipankar DAS
INSTRUCTIONS TO CONVERT FROM FP16 TO FP8

Publication number: 20240045677

Abstract: Techniques for converting FP16 or FP32 data elements to FP8 data elements using a single instruction are described. An exemplary apparatus includes decoder circuitry to decode a single instruction, the single instruction to include a one or more fields to identify a source operand, one or more fields to identify a destination operand, and one or more fields for an opcode, the opcode to indicate that execution circuitry is to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed FP8 data and store the packed bfloat8 data into corresponding data element positions of the identified destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions.

Type: Application

Filed: October 1, 2022

Publication date: February 8, 2024

Inventors: Alexander Heinecke, Menachem Adelman, Mark Charney, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber, Robert Valentine
INSTRUCTIONS TO CONVERT FROM FP16 TO FP8

Publication number: 20240045684

Abstract: Techniques for converting FP16 to BF8 using bias are described.

Type: Application

Filed: October 1, 2022

Publication date: February 8, 2024

Inventors: Alexander Heinecke, Menachem Adelman, Mark Charney, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber, Robert Valentine
8-BIT FLOATING POINT COMPARISON INSTRUCTIONS

Publication number: 20240045681

Abstract: Techniques for comparing FP8 data elements are described. An exemplary FP8 comparison instruction includes fields for an opcode, an identification of a location of a first packed data source operand, and an identification of a location of a second packed data source operand, wherein the opcode is to indicate that execution circuitry is to perform, for a particular data element position of the packed data source operands, a comparison of a data element at that position, and update a flags register based on the comparison.

Type: Application

Filed: October 1, 2022

Publication date: February 8, 2024

Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
8-BIT FLOATING POINT SQUARE ROOT AND/OR RECIPROCAL SQUARE ROOT INSTRUCTIONS

Publication number: 20240045683

Abstract: Techniques for performing square root or reciprocal square root calculations on FP8 data elements in response to an instruction are described. An example of an instruction is one that includes fields for an opcode, an identification of a location of a packed data source operand, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operand, a calculation of a square root value of a FP8 data element in that position and store a result of each square root into a corresponding data element position of the packed data destination operand.

Type: Application

Filed: October 1, 2022

Publication date: February 8, 2024

Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
8-BIT FLOATING POINT FUSED MULTIPLY INSTRUCTIONS

Publication number: 20240045688

Abstract: Techniques for performing FP8 FMA in response to an instruction are described. In some examples, an instruction has fields for an opcode, an identification of location of a packed data source/destination operand (a first source), an identification of a location of a second packed data source operand, an identification of a location of a third packed data source operand, and an identification of location of a packed data source/destination operand, wherein the opcode is to indicate operand ordering and that execution circuitry is to, per data element position, perform a FP8 value fused multiply-accumulate operation using the first, second, and third source operands and store a result in a corresponding data element position of the source/destination operand, wherein the FP8 value has an 8-bit floating point format that comprises one bit for a sign, at least 4 bits for an exponent, and at least two bits for a fraction.

Type: Application

Filed: October 1, 2022

Publication date: February 8, 2024

Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
APPARATUSES, METHODS, AND SYSTEMS FOR 8-BIT FLOATING-POINT MATRIX DOT PRODUCT INSTRUCTIONS

Publication number: 20240045691

Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described.

Type: Application

Filed: October 1, 2022

Publication date: February 8, 2024

Inventors: Naveen Mellempudi, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Alexander Heinecke, Simon Rubanovich, Uri Sherman, Zeev Sperber
8-BIT FLOATING POINT CLASSIFICATION AND MANIPULATION INSTRUCTIONS

Publication number: 20240045687

Abstract: Techniques for FP8 classification or manipulation using single instructions are described. An exemplary instruction includes fields for an opcode, an identification of a location of a packed data source operand, an indication of one or more classification checks to perform, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operand, a classification according to the indicated one or more classification checks and store a result of the classification in a corresponding data element position of the destination operand.

Type: Application

Filed: October 1, 2022

Publication date: February 8, 2024

Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
8-BIT FLOATING POINT SCALE AND/OR REDUCE INSTRUCTIONS

Publication number: 20240045682

Abstract: Techniques for scale and reduction of FP8 data elements are described. An exemplary instruction includes fields for an having fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operands, a floating point scale operation of a FP8 data element of the first packed data source by multiplying the data element by a power of 2 value, wherein a value of the exponent of the power of 2 value is a floor value of a FP8 data element of the second packed data source, and store a result of the floating point scale operation into a corresponding data element position of the packed data destination operand.

Type: Application

Filed: October 1, 2022

Publication date: February 8, 2024

Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
8-BIT FLOATING POINT SOURCE ARITHMETIC INSTRUCTIONS

Publication number: 20240045654

Abstract: Techniques for performing arithmetic operations on FP8 values are described. An exemplary instruction includes fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand, and an identification of location of a packed data destination operand, wherein the opcode is to indicate an arithmetic operation execution circuitry is to perform, for each data element position of the identified packed data source operands, the arithmetic operation on FP8 data elements in that data element position in FP8 format and store a result of each arithmetic operation into a corresponding data element position of the identified packed data destination operand.

Type: Application

Filed: October 1, 2022

Publication date: February 8, 2024

Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
INSTRUCTIONS TO CONVERT FROM FP8

Publication number: 20240045686

Abstract: Techniques for converting FP8 data elements to FP16 or FP32 data elements using a single instruction are described. An example apparatus includes decoder circuitry to decode a single instruction, the single instruction to indicate that execution circuitry is to convert packed FP8 data from the identified source to packed half-precision floating-point data or single-precision floating point data and store the packed half-precision floating-point data or single-precision floating point data into corresponding data element positions of the identified destination operand.

Type: Application

Filed: October 1, 2022

Publication date: February 8, 2024

Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR STRUCTURED-SPARSE TILE MATRIX FMA

Publication number: 20240045685

Abstract: Systems, methods, and apparatuses relating sparsity based FMA. In some examples, an instance of a single FMA instruction has one or more fields for an opcode, one or more fields to identify a source/destination matrix operand, one or more fields to identify a first plurality of source matrix operands, one or more fields to identify a second plurality of matrix operands, wherein the opcode is to indicate that execution circuitry is to select a proper subset of FP8 data elements from the first plurality of source matrix operands based on sparsity controls from a first matrix operand of the second plurality of matrix operands and perform a FMA.

Type: Application

Filed: October 1, 2022

Publication date: February 8, 2024

Inventors: Menachem Adelman, Amit Gradstein, Alexander Heinecke, Christopher Hughes, Naveen Mellempudi, Shahar Mizrahi, Dana Rip, Simon Rubanovich, Uri Sherman, Guy Boudoukh, Evangelos Georganas, Nilesh Jain, Barukh Ziv
SYSTEMS AND METHODS FOR PERFORMING 8-BIT FLOATING-POINT VECTOR DOT PRODUCT INSTRUCTIONS

Publication number: 20240045689

Abstract: Disclosed embodiments relate to systems and methods for performing 8-bit floating-point vector dot product instructions. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first source, second source, and destination vectors, the opcode to indicate execution circuitry is to multiply pairs of 8-bit floating-point formatted elements of the specified first and second sources, and accumulate the resulting products with previous contents of a corresponding single-precision element of the specified destination, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.

Type: Application

Filed: October 1, 2022

Publication date: February 8, 2024

Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
Incremental precision networks using residual inference and fine-grain quantization

Patent number: 11893490

Abstract: One embodiment provides for a computer-readable medium storing instructions that cause one or more processors to perform operations comprising determining a per-layer scale factor to apply to tensor data associated with layers of a neural network model and converting the tensor data to converted tensor data. The tensor data may be converted from a floating point datatype to a second datatype that is an 8-bit datatype. The instructions further cause the one or more processors to generate an output tensor based on the converted tensor data and the per-layer scale factor.

Type: Grant

Filed: November 30, 2022

Date of Patent: February 6, 2024

Assignee: Intel Corporation

Inventors: Abhisek Kundu, Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das
SCALING HALF-PRECISION FLOATING POINT TENSORS FOR TRAINING DEEP NEURAL NETWORKS

Publication number: 20230409891

Abstract: One embodiment provides for a machine-learning accelerator device a multiprocessor to execute parallel threads of an instruction stream, the multiprocessor including a compute unit, the compute unit including a set of functional units, each functional unit to execute at least one of the parallel threads of the instruction stream. The compute unit includes compute logic configured to execute a single instruction to scale an input tensor associated with a layer of a neural network according to a scale factor, the input tensor stored in a floating-point data type, the compute logic to scale the input tensor to enable a data distribution of data of the input tensor to be represented by a 16-bit floating point data type.

Type: Application

Filed: August 25, 2023

Publication date: December 21, 2023

Applicant: Intel Corporation

Inventors: NAVEEN MELLEMPUDI, DIPANKAR DAS
Scaling half-precision floating point tensors for training deep neural networks

Patent number: 11823034

Abstract: A graphics processor is described that includes a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The multiprocessor can execute parallel threads of instructions associated with a command stream, where the multiprocessor includes a set of functional units to execute at least one of the parallel threads of the instructions. The set of functional units can include a mixed precision tensor processor to perform tensor computations to generate loss data. The loss data is stored as a first floating-point data type and scaled by a scaling factor to enable a data distribution of a gradient tensor generated based on the loss data to be represented by a second floating point data type.

Type: Grant

Filed: October 6, 2022

Date of Patent: November 21, 2023

Assignee: Intel Corporation

Inventors: Naveen Mellempudi, Dipankar Das
APPARATUSES, METHODS, AND SYSTEMS FOR 8-BIT FLOATING-POINT MATRIX DOT PRODUCT INSTRUCTIONS

Publication number: 20230315450

Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described.

Type: Application

Filed: May 5, 2023

Publication date: October 5, 2023

Applicant: Intel Corporation

Inventors: Naveen Mellempudi, Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Christopher J. Hughes, Evangelos Georganas, Zeev Sperber, Amit Gradstein, Simon Rubanovich
SCALING HALF-PRECISION FLOATING POINT TENSORS FOR TRAINING DEEP NEURAL NETWORKS

Publication number: 20230141038

Abstract: A graphics processor is described that includes a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The multiprocessor can execute parallel threads of instructions associated with a command stream, where the multiprocessor includes a set of functional units to execute at least one of the parallel threads of the instructions. The set of functional units can include a mixed precision tensor processor to perform tensor computations to generate loss data. The loss data is stored as a first floating-point data type and scaled by a scaling factor to enable a data distribution of a gradient tensor generated based on the loss data to be represented by a second floating point data type.

Type: Application

Filed: October 6, 2022

Publication date: May 11, 2023

Applicant: Intel Corporation

Inventors: NAVEEN MELLEMPUDI, DIPANKAR DAS
INCREMENTAL PRECISION NETWORKS USING RESIDUAL INFERENCE AND FINE-GRAIN QUANTIZATION

Publication number: 20230087364

Abstract: One embodiment provides for a computer-readable medium storing instructions that cause one or more processors to perform operations comprising determining a per-layer scale factor to apply to tensor data associated with layers of a neural network model and converting the tensor data to converted tensor data. The tensor data may be converted from a floating point datatype to a second datatype that is an 8-bit datatype. The instructions further cause the one or more processors to generate an output tensor based on the converted tensor data and the per-layer scale factor.

Type: Application

Filed: November 30, 2022

Publication date: March 23, 2023

Applicant: Intel Corporation

Inventors: Abhisek KUNDU, NAVEEN MELLEMPUDI, DHEEVATSA MUDIGERE, Dipankar DAS
Incremental precision networks using residual inference and fine-grain quantization

Patent number: 11556772

Abstract: One embodiment provides for a computing device comprising a parallel processor compute unit to perform a set of parallel integer compute operations; a ternarization unit including a weight ternarization circuit and an activation quantization circuit; wherein the weight ternarization circuit is to convert a weight tensor from a floating-point representation to a ternary representation including a ternary weight and a scale factor; wherein the activation quantization circuit is to convert an activation tensor from a floating-point representation to an integer representation; and wherein the parallel processor compute unit includes one or more circuits to perform the set of parallel integer compute operations on the ternary representation of the weight tensor and the integer representation of the activation tensor.

Type: Grant

Filed: January 12, 2018

Date of Patent: January 17, 2023

Assignee: Intel Corporation

Inventors: Abhisek Kundu, Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das

1 2 3 next