Patents by Inventor Naveen Mellempudi

Naveen Mellempudi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240160931
    Abstract: One embodiment provides for a computer-readable medium storing instructions that cause one or more processors to perform operations comprising determining a per-layer scale factor to apply to tensor data associated with layers of a neural network model and converting the tensor data to converted tensor data. The tensor data may be converted from a floating point datatype to a second datatype that is an 8-bit datatype. The instructions further cause the one or more processors to generate an output tensor based on the converted tensor data and the per-layer scale factor.
    Type: Application
    Filed: December 7, 2023
    Publication date: May 16, 2024
    Applicant: Intel Corporation
    Inventors: Abhisek KUNDU, NAVEEN MELLEMPUDI, DHEEVATSA MUDIGERE, Dipankar DAS
  • Publication number: 20240045677
    Abstract: Techniques for converting FP16 or FP32 data elements to FP8 data elements using a single instruction are described. An exemplary apparatus includes decoder circuitry to decode a single instruction, the single instruction to include a one or more fields to identify a source operand, one or more fields to identify a destination operand, and one or more fields for an opcode, the opcode to indicate that execution circuitry is to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed FP8 data and store the packed bfloat8 data into corresponding data element positions of the identified destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions.
    Type: Application
    Filed: October 1, 2022
    Publication date: February 8, 2024
    Inventors: Alexander Heinecke, Menachem Adelman, Mark Charney, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber, Robert Valentine
  • Publication number: 20240045684
    Abstract: Techniques for converting FP16 to BF8 using bias are described.
    Type: Application
    Filed: October 1, 2022
    Publication date: February 8, 2024
    Inventors: Alexander Heinecke, Menachem Adelman, Mark Charney, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber, Robert Valentine
  • Publication number: 20240045681
    Abstract: Techniques for comparing FP8 data elements are described. An exemplary FP8 comparison instruction includes fields for an opcode, an identification of a location of a first packed data source operand, and an identification of a location of a second packed data source operand, wherein the opcode is to indicate that execution circuitry is to perform, for a particular data element position of the packed data source operands, a comparison of a data element at that position, and update a flags register based on the comparison.
    Type: Application
    Filed: October 1, 2022
    Publication date: February 8, 2024
    Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
  • Publication number: 20240045683
    Abstract: Techniques for performing square root or reciprocal square root calculations on FP8 data elements in response to an instruction are described. An example of an instruction is one that includes fields for an opcode, an identification of a location of a packed data source operand, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operand, a calculation of a square root value of a FP8 data element in that position and store a result of each square root into a corresponding data element position of the packed data destination operand.
    Type: Application
    Filed: October 1, 2022
    Publication date: February 8, 2024
    Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
  • Publication number: 20240045688
    Abstract: Techniques for performing FP8 FMA in response to an instruction are described. In some examples, an instruction has fields for an opcode, an identification of location of a packed data source/destination operand (a first source), an identification of a location of a second packed data source operand, an identification of a location of a third packed data source operand, and an identification of location of a packed data source/destination operand, wherein the opcode is to indicate operand ordering and that execution circuitry is to, per data element position, perform a FP8 value fused multiply-accumulate operation using the first, second, and third source operands and store a result in a corresponding data element position of the source/destination operand, wherein the FP8 value has an 8-bit floating point format that comprises one bit for a sign, at least 4 bits for an exponent, and at least two bits for a fraction.
    Type: Application
    Filed: October 1, 2022
    Publication date: February 8, 2024
    Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
  • Publication number: 20240045691
    Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described.
    Type: Application
    Filed: October 1, 2022
    Publication date: February 8, 2024
    Inventors: Naveen Mellempudi, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Alexander Heinecke, Simon Rubanovich, Uri Sherman, Zeev Sperber
  • Publication number: 20240045687
    Abstract: Techniques for FP8 classification or manipulation using single instructions are described. An exemplary instruction includes fields for an opcode, an identification of a location of a packed data source operand, an indication of one or more classification checks to perform, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operand, a classification according to the indicated one or more classification checks and store a result of the classification in a corresponding data element position of the destination operand.
    Type: Application
    Filed: October 1, 2022
    Publication date: February 8, 2024
    Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
  • Publication number: 20240045682
    Abstract: Techniques for scale and reduction of FP8 data elements are described. An exemplary instruction includes fields for an having fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operands, a floating point scale operation of a FP8 data element of the first packed data source by multiplying the data element by a power of 2 value, wherein a value of the exponent of the power of 2 value is a floor value of a FP8 data element of the second packed data source, and store a result of the floating point scale operation into a corresponding data element position of the packed data destination operand.
    Type: Application
    Filed: October 1, 2022
    Publication date: February 8, 2024
    Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
  • Publication number: 20240045654
    Abstract: Techniques for performing arithmetic operations on FP8 values are described. An exemplary instruction includes fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand, and an identification of location of a packed data destination operand, wherein the opcode is to indicate an arithmetic operation execution circuitry is to perform, for each data element position of the identified packed data source operands, the arithmetic operation on FP8 data elements in that data element position in FP8 format and store a result of each arithmetic operation into a corresponding data element position of the identified packed data destination operand.
    Type: Application
    Filed: October 1, 2022
    Publication date: February 8, 2024
    Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
  • Publication number: 20240045686
    Abstract: Techniques for converting FP8 data elements to FP16 or FP32 data elements using a single instruction are described. An example apparatus includes decoder circuitry to decode a single instruction, the single instruction to indicate that execution circuitry is to convert packed FP8 data from the identified source to packed half-precision floating-point data or single-precision floating point data and store the packed half-precision floating-point data or single-precision floating point data into corresponding data element positions of the identified destination operand.
    Type: Application
    Filed: October 1, 2022
    Publication date: February 8, 2024
    Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
  • Publication number: 20240045685
    Abstract: Systems, methods, and apparatuses relating sparsity based FMA. In some examples, an instance of a single FMA instruction has one or more fields for an opcode, one or more fields to identify a source/destination matrix operand, one or more fields to identify a first plurality of source matrix operands, one or more fields to identify a second plurality of matrix operands, wherein the opcode is to indicate that execution circuitry is to select a proper subset of FP8 data elements from the first plurality of source matrix operands based on sparsity controls from a first matrix operand of the second plurality of matrix operands and perform a FMA.
    Type: Application
    Filed: October 1, 2022
    Publication date: February 8, 2024
    Inventors: Menachem Adelman, Amit Gradstein, Alexander Heinecke, Christopher Hughes, Naveen Mellempudi, Shahar Mizrahi, Dana Rip, Simon Rubanovich, Uri Sherman, Guy Boudoukh, Evangelos Georganas, Nilesh Jain, Barukh Ziv
  • Publication number: 20240045689
    Abstract: Disclosed embodiments relate to systems and methods for performing 8-bit floating-point vector dot product instructions. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first source, second source, and destination vectors, the opcode to indicate execution circuitry is to multiply pairs of 8-bit floating-point formatted elements of the specified first and second sources, and accumulate the resulting products with previous contents of a corresponding single-precision element of the specified destination, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.
    Type: Application
    Filed: October 1, 2022
    Publication date: February 8, 2024
    Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
  • Patent number: 11893490
    Abstract: One embodiment provides for a computer-readable medium storing instructions that cause one or more processors to perform operations comprising determining a per-layer scale factor to apply to tensor data associated with layers of a neural network model and converting the tensor data to converted tensor data. The tensor data may be converted from a floating point datatype to a second datatype that is an 8-bit datatype. The instructions further cause the one or more processors to generate an output tensor based on the converted tensor data and the per-layer scale factor.
    Type: Grant
    Filed: November 30, 2022
    Date of Patent: February 6, 2024
    Assignee: Intel Corporation
    Inventors: Abhisek Kundu, Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das
  • Publication number: 20230409891
    Abstract: One embodiment provides for a machine-learning accelerator device a multiprocessor to execute parallel threads of an instruction stream, the multiprocessor including a compute unit, the compute unit including a set of functional units, each functional unit to execute at least one of the parallel threads of the instruction stream. The compute unit includes compute logic configured to execute a single instruction to scale an input tensor associated with a layer of a neural network according to a scale factor, the input tensor stored in a floating-point data type, the compute logic to scale the input tensor to enable a data distribution of data of the input tensor to be represented by a 16-bit floating point data type.
    Type: Application
    Filed: August 25, 2023
    Publication date: December 21, 2023
    Applicant: Intel Corporation
    Inventors: NAVEEN MELLEMPUDI, DIPANKAR DAS
  • Patent number: 11823034
    Abstract: A graphics processor is described that includes a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The multiprocessor can execute parallel threads of instructions associated with a command stream, where the multiprocessor includes a set of functional units to execute at least one of the parallel threads of the instructions. The set of functional units can include a mixed precision tensor processor to perform tensor computations to generate loss data. The loss data is stored as a first floating-point data type and scaled by a scaling factor to enable a data distribution of a gradient tensor generated based on the loss data to be represented by a second floating point data type.
    Type: Grant
    Filed: October 6, 2022
    Date of Patent: November 21, 2023
    Assignee: Intel Corporation
    Inventors: Naveen Mellempudi, Dipankar Das
  • Publication number: 20230315450
    Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described.
    Type: Application
    Filed: May 5, 2023
    Publication date: October 5, 2023
    Applicant: Intel Corporation
    Inventors: Naveen Mellempudi, Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Christopher J. Hughes, Evangelos Georganas, Zeev Sperber, Amit Gradstein, Simon Rubanovich
  • Publication number: 20230141038
    Abstract: A graphics processor is described that includes a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The multiprocessor can execute parallel threads of instructions associated with a command stream, where the multiprocessor includes a set of functional units to execute at least one of the parallel threads of the instructions. The set of functional units can include a mixed precision tensor processor to perform tensor computations to generate loss data. The loss data is stored as a first floating-point data type and scaled by a scaling factor to enable a data distribution of a gradient tensor generated based on the loss data to be represented by a second floating point data type.
    Type: Application
    Filed: October 6, 2022
    Publication date: May 11, 2023
    Applicant: Intel Corporation
    Inventors: NAVEEN MELLEMPUDI, DIPANKAR DAS
  • Publication number: 20230087364
    Abstract: One embodiment provides for a computer-readable medium storing instructions that cause one or more processors to perform operations comprising determining a per-layer scale factor to apply to tensor data associated with layers of a neural network model and converting the tensor data to converted tensor data. The tensor data may be converted from a floating point datatype to a second datatype that is an 8-bit datatype. The instructions further cause the one or more processors to generate an output tensor based on the converted tensor data and the per-layer scale factor.
    Type: Application
    Filed: November 30, 2022
    Publication date: March 23, 2023
    Applicant: Intel Corporation
    Inventors: Abhisek KUNDU, NAVEEN MELLEMPUDI, DHEEVATSA MUDIGERE, Dipankar DAS
  • Patent number: 11556772
    Abstract: One embodiment provides for a computing device comprising a parallel processor compute unit to perform a set of parallel integer compute operations; a ternarization unit including a weight ternarization circuit and an activation quantization circuit; wherein the weight ternarization circuit is to convert a weight tensor from a floating-point representation to a ternary representation including a ternary weight and a scale factor; wherein the activation quantization circuit is to convert an activation tensor from a floating-point representation to an integer representation; and wherein the parallel processor compute unit includes one or more circuits to perform the set of parallel integer compute operations on the ternary representation of the weight tensor and the integer representation of the activation tensor.
    Type: Grant
    Filed: January 12, 2018
    Date of Patent: January 17, 2023
    Assignee: Intel Corporation
    Inventors: Abhisek Kundu, Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das