Patents by Inventor Amit Gradstein

Amit Gradstein has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240045686
    Abstract: Techniques for converting FP8 data elements to FP16 or FP32 data elements using a single instruction are described. An example apparatus includes decoder circuitry to decode a single instruction, the single instruction to indicate that execution circuitry is to convert packed FP8 data from the identified source to packed half-precision floating-point data or single-precision floating point data and store the packed half-precision floating-point data or single-precision floating point data into corresponding data element positions of the identified destination operand.
    Type: Application
    Filed: October 1, 2022
    Publication date: February 8, 2024
    Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
  • Publication number: 20240045681
    Abstract: Techniques for comparing FP8 data elements are described. An exemplary FP8 comparison instruction includes fields for an opcode, an identification of a location of a first packed data source operand, and an identification of a location of a second packed data source operand, wherein the opcode is to indicate that execution circuitry is to perform, for a particular data element position of the packed data source operands, a comparison of a data element at that position, and update a flags register based on the comparison.
    Type: Application
    Filed: October 1, 2022
    Publication date: February 8, 2024
    Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
  • Publication number: 20240045691
    Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described.
    Type: Application
    Filed: October 1, 2022
    Publication date: February 8, 2024
    Inventors: Naveen Mellempudi, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Alexander Heinecke, Simon Rubanovich, Uri Sherman, Zeev Sperber
  • Publication number: 20240045685
    Abstract: Systems, methods, and apparatuses relating sparsity based FMA. In some examples, an instance of a single FMA instruction has one or more fields for an opcode, one or more fields to identify a source/destination matrix operand, one or more fields to identify a first plurality of source matrix operands, one or more fields to identify a second plurality of matrix operands, wherein the opcode is to indicate that execution circuitry is to select a proper subset of FP8 data elements from the first plurality of source matrix operands based on sparsity controls from a first matrix operand of the second plurality of matrix operands and perform a FMA.
    Type: Application
    Filed: October 1, 2022
    Publication date: February 8, 2024
    Inventors: Menachem Adelman, Amit Gradstein, Alexander Heinecke, Christopher Hughes, Naveen Mellempudi, Shahar Mizrahi, Dana Rip, Simon Rubanovich, Uri Sherman, Guy Boudoukh, Evangelos Georganas, Nilesh Jain, Barukh Ziv
  • Publication number: 20240045689
    Abstract: Disclosed embodiments relate to systems and methods for performing 8-bit floating-point vector dot product instructions. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first source, second source, and destination vectors, the opcode to indicate execution circuitry is to multiply pairs of 8-bit floating-point formatted elements of the specified first and second sources, and accumulate the resulting products with previous contents of a corresponding single-precision element of the specified destination, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.
    Type: Application
    Filed: October 1, 2022
    Publication date: February 8, 2024
    Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
  • Publication number: 20240045684
    Abstract: Techniques for converting FP16 to BF8 using bias are described.
    Type: Application
    Filed: October 1, 2022
    Publication date: February 8, 2024
    Inventors: Alexander Heinecke, Menachem Adelman, Mark Charney, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber, Robert Valentine
  • Publication number: 20240045688
    Abstract: Techniques for performing FP8 FMA in response to an instruction are described. In some examples, an instruction has fields for an opcode, an identification of location of a packed data source/destination operand (a first source), an identification of a location of a second packed data source operand, an identification of a location of a third packed data source operand, and an identification of location of a packed data source/destination operand, wherein the opcode is to indicate operand ordering and that execution circuitry is to, per data element position, perform a FP8 value fused multiply-accumulate operation using the first, second, and third source operands and store a result in a corresponding data element position of the source/destination operand, wherein the FP8 value has an 8-bit floating point format that comprises one bit for a sign, at least 4 bits for an exponent, and at least two bits for a fraction.
    Type: Application
    Filed: October 1, 2022
    Publication date: February 8, 2024
    Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
  • Publication number: 20240045682
    Abstract: Techniques for scale and reduction of FP8 data elements are described. An exemplary instruction includes fields for an having fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operands, a floating point scale operation of a FP8 data element of the first packed data source by multiplying the data element by a power of 2 value, wherein a value of the exponent of the power of 2 value is a floor value of a FP8 data element of the second packed data source, and store a result of the floating point scale operation into a corresponding data element position of the packed data destination operand.
    Type: Application
    Filed: October 1, 2022
    Publication date: February 8, 2024
    Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
  • Patent number: 11893389
    Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.
    Type: Grant
    Filed: March 27, 2023
    Date of Patent: February 6, 2024
    Assignee: Intel Corporation
    Inventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Raanan Sade, Menachem Adelman, Zeev Sperber, Amit Gradstein, Simon Rubanovich
  • Publication number: 20240036865
    Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described.
    Type: Application
    Filed: June 17, 2023
    Publication date: February 1, 2024
    Inventors: Regev Shemy, Zeev Sperber, Wajdi Feghali, Vinodh Gopal, Amit Gradstein, Simon Rubanovich, Sean Gulley, Ilya Albrekht, Jacob Doweck, Jose Yallouz, Ittai Anati
  • Publication number: 20240004662
    Abstract: Techniques for performing horizontal reductions are described. In some examples, an instance of a horizontal instruction is to include at least one field for an opcode, one or more fields to reference a first source operand, and one or more fields to reference a destination operand, wherein the opcode is to indicate that execution circuitry is, in response to a decoded instance of the single instruction, to at least perform a horizontal reduction using at least one data element of a non-masked data element position of at least the first source operand and store a result of the horizontal reduction in the destination operand.
    Type: Application
    Filed: July 2, 2022
    Publication date: January 4, 2024
    Inventors: Menachem ADELMAN, Amit GRADSTEIN, Regev SHEMY, Chitra NATARAJAN, Leonardo BORGES, Chytra SHIVASWAMY, Igor ERMOLAEV, Michael ESPIG, Or BEIT AHARON, Jeff WIEDEMEIER
  • Publication number: 20230418602
    Abstract: Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand.
    Type: Application
    Filed: August 28, 2023
    Publication date: December 28, 2023
    Inventors: Robert Valentine, Galina Ryvchin, Piotr Majcher, Mark J. Charney, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Milind B. Girkar, Zeev Sperber, Simon Rubanovich, Amit Gradstein
  • Publication number: 20230409333
    Abstract: Techniques for performing prefix sums in response to a single instruction are describe are described. In some examples, the single instruction includes fields for an opcode, one or fields to reference a first source operand, one or fields to reference a second source operand, one or fields to reference a destination operand, wherein the opcode is to indicate that execution circuitry is, in response to a decoded instance of the single instruction, to at least: perform a prefix sum by for each non-masked data element position of the second source operand adding a data element of that data element position to each data element of preceding data element positions and adding at least one data element of a defined data element position of the first source operand, and store each prefix sum for each data element position of the second source operand into a corresponding data element position of the destination operand.
    Type: Application
    Filed: June 17, 2022
    Publication date: December 21, 2023
    Inventors: Menachem ADELMAN, Amit GRADSTEIN, Regev SHEMY, Chitra NATARAJAN, Igor ERMOLAEV
  • Publication number: 20230409326
    Abstract: Techniques and mechanisms for processor circuitry to execute a load and expand instruction of an instruction set to generate decompressed matrix data. In an embodiment, the instruction comprises a source operand which indicates a location from which compressed matrix data, and corresponding metadata, are to be accessed. A destination operand of the instruction indicates a location which is to receive decompressed metadata, which is generated, during execution of the instruction, based on the compressed matrix data and the corresponding metadata. The metadata comprises compression mask information which identifies which elements of the matrix have been masked from the compressed matrix data. In another embodiment, the instruction further comprises a count operand which identifies a total number of the unmasked matrix elements which are represented in the compressed matrix data.
    Type: Application
    Filed: June 15, 2022
    Publication date: December 21, 2023
    Applicant: Intel Corporation
    Inventors: Menachem Adelman, Amit Gradstein, Simon Rubanovich, Barukh Ziv, Uri Sherman, Dana Rip, Shahar Mizrahi, Dan Baum, Rinat Rappoport, Nilesh Jain, Zeev Sperber, Gideon Stupp, Alexander Heinecke, Christopher Hughes, Evangelos Georganas
  • Publication number: 20230385059
    Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (M,N) of the identified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the identified first source matrix by a corresponding nibble of a doubleword element (K,N) of the identified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element (M,N).
    Type: Application
    Filed: August 14, 2023
    Publication date: November 30, 2023
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall
  • Publication number: 20230376313
    Abstract: Techniques for instructions for min-max operations are described. An example apparatus comprises decoder circuitry to decode a single instruction, the single instruction to include fields for identifiers of a first source operand, a second source operand, an a destination operand, a field for an immediate operand, and a field for an opcode, the opcode to indicate execution circuitry is to perform a min-max operation, and execution circuitry to execute the decoded instruction according to the opcode to perform the min-max operation to determine a particular operation of five or more minimum and maximum operations in accordance with a value of the immediate operand, perform the determined particular operation on the identified first source operand and the identified second source operand to return a result, and store the result into the identified destination operand. Other examples are described and claimed.
    Type: Application
    Filed: May 18, 2022
    Publication date: November 23, 2023
    Applicant: Intel Corporation
    Inventors: Menachem Adelman, Amit Gradstein, Cristina Anderson, Marius Cornea-Hasegan
  • Patent number: 11816483
    Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address, and execution circuitry to execute the decoded instruction to store configuration information about usage of storage for two-dimensional data structures at the memory address.
    Type: Grant
    Filed: December 29, 2017
    Date of Patent: November 14, 2023
    Assignee: Intel Corporation
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
  • Patent number: 11809869
    Abstract: Embodiments detailed herein relate to systems and methods to store a tile register pair to memory. In one example, a processor includes: decode circuitry to decode a store matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded store matrix pair instruction to store every element of left and right tiles of the identified source matrix to corresponding element positions of left and right tiles of the identified destination matrix, respectively, wherein the executing stores a chunk of C elements of one row of the identified source matrix at a time.
    Type: Grant
    Filed: December 29, 2017
    Date of Patent: November 7, 2023
    Assignee: Intel Corporation
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
  • Patent number: 11789729
    Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (M,N) of the identified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the identified first source matrix by a corresponding nibble of a doubleword element (K,N) of the identified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element (M,N).
    Type: Grant
    Filed: December 29, 2017
    Date of Patent: October 17, 2023
    Assignee: Intel Corporation
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 11782709
    Abstract: Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand.
    Type: Grant
    Filed: October 13, 2022
    Date of Patent: October 10, 2023
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Galina Ryvchin, Piotr Majcher, Mark J. Charney, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Milind B. Girkar, Zeev Sperber, Simon Rubanovich, Amit Gradstein