Patents by Inventor Amit Gradstein

Amit Gradstein has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210117194
    Abstract: Disclosed embodiments relate to systems and methods for performing 16-bit floating-point vector dot product instructions. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first source, second source, and destination vectors, the opcode to indicate execution circuitry is to multiply N pairs of 16-bit floating-point formatted elements of the specified first and second sources, and accumulate the resulting products with previous contents of a corresponding single-precision element of the specified destination, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.
    Type: Application
    Filed: December 23, 2020
    Publication date: April 22, 2021
    Inventors: Alexander F. HEINECKE, Robert VALENTINE, Mark J. CHARNEY, Raanan SADE, Menachem ADELMAN, Zeev SPERBER, Amit GRADSTEIN, Simon RUBANOVICH
  • Publication number: 20210096822
    Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transpose rectangular tiles. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first destination, second destination, first source, and second source matrices, the specified opcode to cause the processor to process each of the specified source and destination matrices as a rectangular matrix, decode circuitry to decode the fetched rectangular matrix transpose instruction, and execution circuitry to respond to the decoded rectangular matrix transpose instruction by transposing each row of elements of the specified first source matrix into a corresponding column of the specified first destination matrix and transposing each row of elements of the specified second source matrix into a corresponding column of the specified second destination matrix.
    Type: Application
    Filed: December 14, 2020
    Publication date: April 1, 2021
    Inventors: Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Simon RUBANOVICH, Amit GRADSTEIN, Zeev SPERBER, Bret TOLL, Jesus CORBAL, Christopher J. HUGHES, Alexander F. HEINECKE, Elmoustapha OULD-AHMED-VALL
  • Patent number: 10963246
    Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.
    Type: Grant
    Filed: November 9, 2018
    Date of Patent: March 30, 2021
    Assignee: Intel Corporation
    Inventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Raanan Sade, Menachem Adelman, Zeev Sperber, Amit Gradstein, Simon Rubanovich
  • Patent number: 10942738
    Abstract: The present disclosure is directed to systems and methods for performing one or more operations on a two dimensional tile register using an accelerator that includes a tiled matrix multiplication unit (TMU). The processor circuitry includes reservation station (RS) circuitry to communicatively couple the processor circuitry to the TMU. The RS circuitry coordinates the operations performed by the TMU. TMU dispatch queue (TDQ) circuitry in the TMU maintains the operations received from the RS circuitry in the order that the operations are received from the RS circuitry. Since the duration of each operation is not known prior to execution by the TMU, the RS circuitry maintains shadow dispatch queue (RS-TDQ) circuitry that mirrors the operations in the TDQ circuitry.
    Type: Grant
    Filed: March 29, 2019
    Date of Patent: March 9, 2021
    Assignee: Intel Corporation
    Inventors: Zeev Sperber, Amit Gradstein, Simon Rubanovich, Igor Yanover, Gavri Berger, Eyal Hadas, Saeed Kharouf, Ron Schneider, Sagi Meller, Jose Yallouz
  • Publication number: 20210049013
    Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described.
    Type: Application
    Filed: November 2, 2020
    Publication date: February 18, 2021
    Inventors: Regev Shemy, Zeev Sperber, Wajdi Feghali, Vinodh Gopal, Amit Gradstein, Simon Rubanovich, Sean Gulley, Ilya Albrekht, Jacob Doweck, Jose Yallouz, Ittai Anati
  • Patent number: 10866786
    Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transpose rectangular tiles. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first destination, second destination, first source, and second source matrices, the specified opcode to cause the processor to process each of the specified source and destination matrices as a rectangular matrix, decode circuitry to decode the fetched rectangular matrix transpose instruction, and execution circuitry to respond to the decoded rectangular matrix transpose instruction by transposing each row of elements of the specified first source matrix into a corresponding column of the specified first destination matrix and transposing each row of elements of the specified second source matrix into a corresponding column of the specified second destination matrix.
    Type: Grant
    Filed: September 27, 2018
    Date of Patent: December 15, 2020
    Assignee: Intel Corporation
    Inventors: Raanan Sade, Robert Valentine, Mark J. Charney, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Bret Toll, Jesus Corbal, Christopher J. Hughes, Alexander F. Heinecke, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 10866807
    Abstract: A method of an aspect includes receiving an instruction indicating a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four non-negative integers in numerical order with all integers in consecutive positions differing by a constant stride of at least two. In an aspect, storing the result including the sequence of the at least four integers is performed without calculating the at least four integers using a result of a preceding instruction. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: December 22, 2011
    Date of Patent: December 15, 2020
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Seth Abraham, Robert Valentine, Zeev Sperber, Amit Gradstein
  • Patent number: 10824428
    Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described.
    Type: Grant
    Filed: March 29, 2019
    Date of Patent: November 3, 2020
    Assignee: Intel Corporation
    Inventors: Regev Shemy, Zeev Sperber, Wajdi Feghali, Vinodh Gopal, Amit Gradstein, Simon Rubanovich, Sean Gulley, Ilya Albrekht, Jacob Doweck, Jose Yallouz, Ittai Anati
  • Publication number: 20200310802
    Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described.
    Type: Application
    Filed: March 29, 2019
    Publication date: October 1, 2020
    Inventors: Regev Shemy, Zeev Sperber, Wajdi Feghali, Vinodh Gopal, Amit Gradstein, Simon Rubanovich, Sean Gulley, Ilya Albrekht, Jacob Doweck, Jose Yallouz, Ittai Anati
  • Publication number: 20200310757
    Abstract: Disclosed embodiments relate to performing floating-point (FP) arithmetic. In one example, a processor is to decode an instruction specifying locations of first, second, and third floating-point (FP) operands and an opcode calling for accumulating a FP product of the first and second FP operands with the third FP operand, and execution circuitry to, in a first cycle, generate the FP product having a Fuzzy-Jbit format comprising a sign bit, a 9-bit exponent, and a 25-bit mantissa having two possible positions for a JBit and, in a second cycle, to accumulate the FP product with the third FP operand, while concurrently, based on Jbit positions of the FP product and the third FP operand, determining an exponent adjustment and a mantissa shift control of a result of the accumulation, wherein performing the exponent adjustment concurrently enhances an ability to perform the accumulation in one cycle.
    Type: Application
    Filed: March 29, 2019
    Publication date: October 1, 2020
    Inventors: Amit GRADSTEIN, Simon RUBANOVICH, Zeev SPERBER
  • Publication number: 20200310803
    Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits; a first plurality of registers that represents an input two-dimensional matrix coupled to the matrix operations accelerator circuit; a decoder, of a core coupled to the matrix operations accelerator circuit, to decode an instruction into a decoded instruction; and an execution circuit of the core to execute the decoded instruction to cause the two-dimensional grid of fused multiply accumulate circuits to form a transpose of the input two-dimensional matrix when the matrix operations accelerator circuit is in a transpose mode.
    Type: Application
    Filed: March 30, 2019
    Publication date: October 1, 2020
    Inventors: Amit Gradstein, Simon Rubanovich, Sagi Meller, Zeev Sperber, Jose Yallouz, Robert Valentine
  • Publication number: 20200310794
    Abstract: The present disclosure is directed to systems and methods for performing one or more operations on a two dimensional tile register using an accelerator that includes a tiled matrix multiplication unit (TMU). The processor circuitry includes reservation station (RS) circuitry to communicatively couple the processor circuitry to the TMU. The RS circuitry coordinates the operations performed by the TMU. TMU dispatch queue (TDQ) circuitry in the TMU maintains the operations received from the RS circuitry in the order that the operations are received from the RS circuitry. Since the duration of each operation is not known prior to execution by the TMU, the RS circuitry maintains shadow dispatch queue (RS-TDQ) circuitry that mirrors the operations in the TDQ circuitry.
    Type: Application
    Filed: March 29, 2019
    Publication date: October 1, 2020
    Applicant: Intel Corporation
    Inventors: ZEEV SPERBER, Amit Gradstein, Simon Rubanovich, Igor Yanover, Gavri Berger, Eyal Hadas, Saeed Kharouf, Ron Schneider, Sagi Meller, Jose Yallouz
  • Publication number: 20200310793
    Abstract: Disclosed embodiments relate to an interleaved pipeline of floating-point (FP) adders. In one example, a processor is to execute an instruction specifying an opcode and locations of a M by K first source matrix, a K by N second source matrix, and a M by N destination matrix, the opcode indicating execution circuitry, for each FP element (M, N) of the destination matrix, is to: launch K instances of a pipeline having a first, MULTIPLY stage, during which a FP element (M, K) of the first source matrix and a corresponding FP element (K, N) of the second source matrix are multiplied; concurrently, in an EXPDIFF stage, determine an exponent difference between the product and a previous FP value of the element (M, N) of the destination matrix; and in a second, ADD-BYPASS stage, accumulate the product with the previous FP value and, concurrently, bypassing the accumulated sum to a subsequent pipeline instance.
    Type: Application
    Filed: March 29, 2019
    Publication date: October 1, 2020
    Applicant: Intel Corporation
    Inventors: Simon RUBANOVICH, Amit GRADSTEIN, Zeev SPERBER
  • Publication number: 20200310756
    Abstract: Disclosed embodiments relate to performing floating-point addition with selected rounding. In one example, a processor includes circuitry to decode and execute an instruction specifying locations of first and second floating-point (FP) sources, and an opcode indicating the processor is to: bring the FP sources into alignment by shifting a mantissa of the smaller source FP operand to the right by a difference between their exponents, generating rounding controls based on any bits that escape; simultaneously generate a sum of the FP sources and of the FP sources plus one, the sums having a fuzzy-Jbit format having an additional Jbit into which a carry-out, if any, select one of the sums based on the rounding controls, and generate a result comprising a mantissa-wide number of most-significant bits of the selected sum, starting with the most significant non-zero Jbit.
    Type: Application
    Filed: March 30, 2019
    Publication date: October 1, 2020
    Applicant: Intel Corporation
    Inventors: Simon RUBANOVICH, Amit GRADSTEIN, Zeev SPERBER, Mrinmay DUTTA
  • Patent number: 10732970
    Abstract: A method of an aspect includes receiving an instruction. The instruction indicates an integer stride, indicates an integer offset, and indicates a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four integers in numerical order with a smallest one of the at least four integers differing from zero by the integer offset and with all integers of the sequence in consecutive positions differing by the integer stride. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: February 8, 2019
    Date of Patent: August 4, 2020
    Assignee: Intel Corporation
    Inventors: Seth Abraham, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Zeev Sperber, Amit Gradstein
  • Patent number: 10719316
    Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.
    Type: Grant
    Filed: November 9, 2017
    Date of Patent: July 21, 2020
    Assignee: INTEL CORPORATION
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
  • Publication number: 20200210625
    Abstract: Integrated circuits to compute a result of summing m values, rotating the sum by k bits, and adding a summation of n values Bi to Bn to the rotated sum. An embodiment includes: a first carry save adder to add up the m values to generate a first carry and a first sum; rotator circuitry to rotate both the first carry and the first sum by k bits to generate a second carry and a second sum; a second carry save adder to add up the second carry, the second sum, and the summation of values Bi to Bn to generate a third carry and a third sum; two parallel adders to generate a first intermediate result and a second intermediary result based on the third carry and the third sum; and a multiplexer to generate the result utilizing various portions of the first and second intermediate results.
    Type: Application
    Filed: December 29, 2018
    Publication date: July 2, 2020
    Inventors: Amit Gradstein, Simon Rubanovich, Regev Shemy, Onkar P Desai, Jose Yallouz
  • Publication number: 20200201932
    Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described.
    Type: Application
    Filed: December 28, 2019
    Publication date: June 25, 2020
    Inventors: Amit GRADSTEIN, Simon RUBANOVICH, Sagi MELLER, Saeed KHAROUF, Gavri BERGER, Zeev SPERBER, Jose YALLOUZ, Ron SCHNEIDER
  • Publication number: 20200201628
    Abstract: Disclosed embodiments relate to efficient complex vector multiplication. In one example, an apparatus includes execution circuitry, responsive to an instruction having fields to specify multiplier, multiplicand, and summand complex vectors, to perform two operations: first, to generate a double-even multiplicand by duplicating even elements of the specified multiplicand, and to generate a temporary vector using a fused multiply-add (FMA) circuit having A, B, and C inputs set to the specified multiplier, the double-even multiplicand, and the specified summand, respectively, and second, to generate a double-odd multiplicand by duplicating odd elements of the specified multiplicand, to generate a swapped multiplier by swapping even and odd elements of the specified multiplier, and to generate a result using a second FMA circuit having its even product negated, and having A, B, and C inputs set to the swapped multiplier, the double-odd multiplicand, and the temporary vector, respectively.
    Type: Application
    Filed: December 2, 2019
    Publication date: June 25, 2020
    Inventors: Raanan SADE, Thierry PONS, Amit GRADSTEIN, Zeev SPERBER, Mark J. CHARNEY, Robert VALENTINE, Eyal Oz-Sinay
  • Patent number: 10649733
    Abstract: A method is described that involves executing a first instruction with a functional unit. The first instruction is a multiply-add instruction. The method further includes executing a second instruction with the functional unit. The second instruction is a round instruction.
    Type: Grant
    Filed: June 10, 2019
    Date of Patent: May 12, 2020
    Assignee: Intel Corporation
    Inventors: Cristina S. Anderson, Zeev Sperber, Simon Rubanovich, Benny Eitan, Amit Gradstein