Patents by Inventor Amit Gradstein

Amit Gradstein has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Using Fuzzy-Jbit location of floating-point multiply-accumulate results

Patent number: 11016731

Abstract: Disclosed embodiments relate to performing floating-point (FP) arithmetic. In one example, a processor is to decode an instruction specifying locations of first, second, and third floating-point (FP) operands and an opcode calling for accumulating a FP product of the first and second FP operands with the third FP operand, and execution circuitry to, in a first cycle, generate the FP product having a Fuzzy-Jbit format comprising a sign bit, a 9-bit exponent, and a 25-bit mantissa having two possible positions for a JBit and, in a second cycle, to accumulate the FP product with the third FP operand, while concurrently, based on Jbit positions of the FP product and the third FP operand, determining an exponent adjustment and a mantissa shift control of a result of the accumulation, wherein performing the exponent adjustment concurrently enhances an ability to perform the accumulation in one cycle.

Type: Grant

Filed: March 29, 2019

Date of Patent: May 25, 2021

Assignee: Intel Corporation

Inventors: Amit Gradstein, Simon Rubanovich, Zeev Sperber
SYSTEMS AND METHODS FOR PERFORMING INSTRUCTIONS TO CONVERT TO 16-BIT FLOATING-POINT FORMAT

Publication number: 20210124581

Abstract: Disclosed embodiments relate to systems and methods for performing instructions to convert to 16-bit floating-point format. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a first source vector comprising N single-precision elements, and a destination vector comprising at least N 16-bit floating-point elements, the opcode to indicate execution circuitry is to convert each of the elements of the specified source vector to 16-bit floating-point, the conversion to include truncation and rounding, as necessary, and to store each converted element into a corresponding location of the specified destination vector, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.

Type: Application

Filed: December 23, 2020

Publication date: April 29, 2021

Inventors: Alexander F. HEINECKE, Robert VALENTINE, Mark J. CHARNEY, Raanan SADE, Menachem ADELMAN, Zeev SPERBER, Amit GRADSTEIN, Simon RUBANOVICH
SYSTEMS AND METHODS FOR PERFORMING INSTRUCTIONS TO CONVERT TO 16-BIT FLOATING-POINT FORMAT

Publication number: 20210124580

Abstract: Disclosed embodiments relate to systems and methods for performing instructions to convert to 16-bit floating-point format. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a first source vector comprising N single-precision elements, and a destination vector comprising at least N 16-bit floating-point elements, the opcode to indicate execution circuitry is to convert each of the elements of the specified source vector to 16-bit floating-point, the conversion to include truncation and rounding, as necessary, and to store each converted element into a corresponding location of the specified destination vector, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.

Type: Application

Filed: December 23, 2020

Publication date: April 29, 2021

Inventors: Alexander F. HEINECKE, Robert VALENTINE, Mark J. CHARNEY, Raanan SADE, Menachem ADELMAN, Zeev SPERBER, Amit GRADSTEIN, Simon RUBANOVICH
Apparatuses, methods, and systems for transpose instructions of a matrix operations accelerator

Patent number: 10990397

Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits; a first plurality of registers that represents an input two-dimensional matrix coupled to the matrix operations accelerator circuit; a decoder, of a core coupled to the matrix operations accelerator circuit, to decode an instruction into a decoded instruction; and an execution circuit of the core to execute the decoded instruction to cause the two-dimensional grid of fused multiply accumulate circuits to form a transpose of the input two-dimensional matrix when the matrix operations accelerator circuit is in a transpose mode.

Type: Grant

Filed: March 30, 2019

Date of Patent: April 27, 2021

Assignee: Intel Corporation

Inventors: Amit Gradstein, Simon Rubanovich, Sagi Meller, Zeev Sperber, Jose Yallouz, Robert Valentine
SYSTEMS AND METHODS FOR PERFORMING 16-BIT FLOATING-POINT VECTOR DOT PRODUCT INSTRUCTIONS

Publication number: 20210117194

Abstract: Disclosed embodiments relate to systems and methods for performing 16-bit floating-point vector dot product instructions. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first source, second source, and destination vectors, the opcode to indicate execution circuitry is to multiply N pairs of 16-bit floating-point formatted elements of the specified first and second sources, and accumulate the resulting products with previous contents of a corresponding single-precision element of the specified destination, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.

Type: Application

Filed: December 23, 2020

Publication date: April 22, 2021

Inventors: Alexander F. HEINECKE, Robert VALENTINE, Mark J. CHARNEY, Raanan SADE, Menachem ADELMAN, Zeev SPERBER, Amit GRADSTEIN, Simon RUBANOVICH
SYSTEMS AND METHODS FOR PERFORMING INSTRUCTIONS TO TRANSPOSE RECTANGULAR TILES

Publication number: 20210096822

Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transpose rectangular tiles. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first destination, second destination, first source, and second source matrices, the specified opcode to cause the processor to process each of the specified source and destination matrices as a rectangular matrix, decode circuitry to decode the fetched rectangular matrix transpose instruction, and execution circuitry to respond to the decoded rectangular matrix transpose instruction by transposing each row of elements of the specified first source matrix into a corresponding column of the specified first destination matrix and transposing each row of elements of the specified second source matrix into a corresponding column of the specified second destination matrix.

Type: Application

Filed: December 14, 2020

Publication date: April 1, 2021

Inventors: Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Simon RUBANOVICH, Amit GRADSTEIN, Zeev SPERBER, Bret TOLL, Jesus CORBAL, Christopher J. HUGHES, Alexander F. HEINECKE, Elmoustapha OULD-AHMED-VALL
Systems and methods for performing 16-bit floating-point matrix dot product instructions

Patent number: 10963246

Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.

Type: Grant

Filed: November 9, 2018

Date of Patent: March 30, 2021

Assignee: Intel Corporation

Inventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Raanan Sade, Menachem Adelman, Zeev Sperber, Amit Gradstein, Simon Rubanovich
Accelerator systems and methods for matrix operations

Patent number: 10942738

Abstract: The present disclosure is directed to systems and methods for performing one or more operations on a two dimensional tile register using an accelerator that includes a tiled matrix multiplication unit (TMU). The processor circuitry includes reservation station (RS) circuitry to communicatively couple the processor circuitry to the TMU. The RS circuitry coordinates the operations performed by the TMU. TMU dispatch queue (TDQ) circuitry in the TMU maintains the operations received from the RS circuitry in the order that the operations are received from the RS circuitry. Since the duration of each operation is not known prior to execution by the TMU, the RS circuitry maintains shadow dispatch queue (RS-TDQ) circuitry that mirrors the operations in the TDQ circuitry.

Type: Grant

Filed: March 29, 2019

Date of Patent: March 9, 2021

Assignee: Intel Corporation

Inventors: Zeev Sperber, Amit Gradstein, Simon Rubanovich, Igor Yanover, Gavri Berger, Eyal Hadas, Saeed Kharouf, Ron Schneider, Sagi Meller, Jose Yallouz
APPARATUSES, METHODS, AND SYSTEMS FOR HASHING INSTRUCTIONS

Publication number: 20210049013

Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described.

Type: Application

Filed: November 2, 2020

Publication date: February 18, 2021

Inventors: Regev Shemy, Zeev Sperber, Wajdi Feghali, Vinodh Gopal, Amit Gradstein, Simon Rubanovich, Sean Gulley, Ilya Albrekht, Jacob Doweck, Jose Yallouz, Ittai Anati
Systems and methods for performing instructions to transpose rectangular tiles

Patent number: 10866786

Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transpose rectangular tiles. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first destination, second destination, first source, and second source matrices, the specified opcode to cause the processor to process each of the specified source and destination matrices as a rectangular matrix, decode circuitry to decode the fetched rectangular matrix transpose instruction, and execution circuitry to respond to the decoded rectangular matrix transpose instruction by transposing each row of elements of the specified first source matrix into a corresponding column of the specified first destination matrix and transposing each row of elements of the specified second source matrix into a corresponding column of the specified second destination matrix.

Type: Grant

Filed: September 27, 2018

Date of Patent: December 15, 2020

Assignee: Intel Corporation

Inventors: Raanan Sade, Robert Valentine, Mark J. Charney, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Bret Toll, Jesus Corbal, Christopher J. Hughes, Alexander F. Heinecke, Elmoustapha Ould-Ahmed-Vall
Processors, methods, systems, and instructions to generate sequences of integers in numerical order that differ by a constant stride

Patent number: 10866807

Abstract: A method of an aspect includes receiving an instruction indicating a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four non-negative integers in numerical order with all integers in consecutive positions differing by a constant stride of at least two. In an aspect, storing the result including the sequence of the at least four integers is performed without calculating the at least four integers using a result of a preceding instruction. Other methods, apparatus, systems, and instructions are disclosed.

Type: Grant

Filed: December 22, 2011

Date of Patent: December 15, 2020

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Seth Abraham, Robert Valentine, Zeev Sperber, Amit Gradstein
Apparatuses, methods, and systems for hashing instructions

Patent number: 10824428

Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described.

Type: Grant

Filed: March 29, 2019

Date of Patent: November 3, 2020

Assignee: Intel Corporation

Inventors: Regev Shemy, Zeev Sperber, Wajdi Feghali, Vinodh Gopal, Amit Gradstein, Simon Rubanovich, Sean Gulley, Ilya Albrekht, Jacob Doweck, Jose Yallouz, Ittai Anati
SYSTEMS AND METHODS TO PERFORM FLOATING-POINT ADDITION WITH SELECTED ROUNDING

Publication number: 20200310756

Abstract: Disclosed embodiments relate to performing floating-point addition with selected rounding. In one example, a processor includes circuitry to decode and execute an instruction specifying locations of first and second floating-point (FP) sources, and an opcode indicating the processor is to: bring the FP sources into alignment by shifting a mantissa of the smaller source FP operand to the right by a difference between their exponents, generating rounding controls based on any bits that escape; simultaneously generate a sum of the FP sources and of the FP sources plus one, the sums having a fuzzy-Jbit format having an additional Jbit into which a carry-out, if any, select one of the sums based on the rounding controls, and generate a result comprising a mantissa-wide number of most-significant bits of the selected sum, starting with the most significant non-zero Jbit.

Type: Application

Filed: March 30, 2019

Publication date: October 1, 2020

Applicant: Intel Corporation

Inventors: Simon RUBANOVICH, Amit GRADSTEIN, Zeev SPERBER, Mrinmay DUTTA
APPARATUSES, METHODS, AND SYSTEMS FOR HASHING INSTRUCTIONS

Publication number: 20200310802

Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described.

Type: Application

Filed: March 29, 2019

Publication date: October 1, 2020

Inventors: Regev Shemy, Zeev Sperber, Wajdi Feghali, Vinodh Gopal, Amit Gradstein, Simon Rubanovich, Sean Gulley, Ilya Albrekht, Jacob Doweck, Jose Yallouz, Ittai Anati
USING FUZZY-JBIT LOCATION OF FLOATING-POINT MULTIPLY-ACCUMULATE RESULTS

Publication number: 20200310757

Abstract: Disclosed embodiments relate to performing floating-point (FP) arithmetic. In one example, a processor is to decode an instruction specifying locations of first, second, and third floating-point (FP) operands and an opcode calling for accumulating a FP product of the first and second FP operands with the third FP operand, and execution circuitry to, in a first cycle, generate the FP product having a Fuzzy-Jbit format comprising a sign bit, a 9-bit exponent, and a 25-bit mantissa having two possible positions for a JBit and, in a second cycle, to accumulate the FP product with the third FP operand, while concurrently, based on Jbit positions of the FP product and the third FP operand, determining an exponent adjustment and a mantissa shift control of a result of the accumulation, wherein performing the exponent adjustment concurrently enhances an ability to perform the accumulation in one cycle.

Type: Application

Filed: March 29, 2019

Publication date: October 1, 2020

Inventors: Amit GRADSTEIN, Simon RUBANOVICH, Zeev SPERBER
APPARATUSES, METHODS, AND SYSTEMS FOR TRANSPOSE INSTRUCTIONS OF A MATRIX OPERATIONS ACCELERATOR

Publication number: 20200310803

Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits; a first plurality of registers that represents an input two-dimensional matrix coupled to the matrix operations accelerator circuit; a decoder, of a core coupled to the matrix operations accelerator circuit, to decode an instruction into a decoded instruction; and an execution circuit of the core to execute the decoded instruction to cause the two-dimensional grid of fused multiply accumulate circuits to form a transpose of the input two-dimensional matrix when the matrix operations accelerator circuit is in a transpose mode.

Type: Application

Filed: March 30, 2019

Publication date: October 1, 2020

Inventors: Amit Gradstein, Simon Rubanovich, Sagi Meller, Zeev Sperber, Jose Yallouz, Robert Valentine
ACCELERATOR SYSTEMS AND METHODS FOR MATRIX OPERATIONS

Publication number: 20200310794

Abstract: The present disclosure is directed to systems and methods for performing one or more operations on a two dimensional tile register using an accelerator that includes a tiled matrix multiplication unit (TMU). The processor circuitry includes reservation station (RS) circuitry to communicatively couple the processor circuitry to the TMU. The RS circuitry coordinates the operations performed by the TMU. TMU dispatch queue (TDQ) circuitry in the TMU maintains the operations received from the RS circuitry in the order that the operations are received from the RS circuitry. Since the duration of each operation is not known prior to execution by the TMU, the RS circuitry maintains shadow dispatch queue (RS-TDQ) circuitry that mirrors the operations in the TDQ circuitry.

Type: Application

Filed: March 29, 2019

Publication date: October 1, 2020

Applicant: Intel Corporation

Inventors: ZEEV SPERBER, Amit Gradstein, Simon Rubanovich, Igor Yanover, Gavri Berger, Eyal Hadas, Saeed Kharouf, Ron Schneider, Sagi Meller, Jose Yallouz
INTERLEAVED PIPELINE OF FLOATING-POINT ADDERS

Publication number: 20200310793

Abstract: Disclosed embodiments relate to an interleaved pipeline of floating-point (FP) adders. In one example, a processor is to execute an instruction specifying an opcode and locations of a M by K first source matrix, a K by N second source matrix, and a M by N destination matrix, the opcode indicating execution circuitry, for each FP element (M, N) of the destination matrix, is to: launch K instances of a pipeline having a first, MULTIPLY stage, during which a FP element (M, K) of the first source matrix and a corresponding FP element (K, N) of the second source matrix are multiplied; concurrently, in an EXPDIFF stage, determine an exponent difference between the product and a previous FP value of the element (M, N) of the destination matrix; and in a second, ADD-BYPASS stage, accumulate the product with the previous FP value and, concurrently, bypassing the accumulated sum to a subsequent pipeline instance.

Type: Application

Filed: March 29, 2019

Publication date: October 1, 2020

Applicant: Intel Corporation

Inventors: Simon RUBANOVICH, Amit GRADSTEIN, Zeev SPERBER
Processors, methods, systems, and instructions to generate sequences of integers in which integers in consecutive positions differ by a constant integer stride and where a smallest integer is offset from zero by an integer offset

Patent number: 10732970

Abstract: A method of an aspect includes receiving an instruction. The instruction indicates an integer stride, indicates an integer offset, and indicates a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four integers in numerical order with a smallest one of the at least four integers differing from zero by the integer offset and with all integers of the sequence in consecutive positions differing by the integer stride. Other methods, apparatus, systems, and instructions are disclosed.

Type: Grant

Filed: February 8, 2019

Date of Patent: August 4, 2020

Assignee: Intel Corporation

Inventors: Seth Abraham, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Zeev Sperber, Amit Gradstein
Apparatus and method of improved packed integer permute instruction

Patent number: 10719316

Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.

Type: Grant

Filed: November 9, 2017

Date of Patent: July 21, 2020

Assignee: INTEL CORPORATION

Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein

prev … 4 5 6 7 8 9 10 11 12 … next