Patents by Inventor Alexander Heinecke
Alexander Heinecke has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250004768Abstract: Decoder circuitry to decode an instruction indicating a first vector register having a 128-bit lane to store a first matrix having two rows by K columns of data elements having a number of bits, a storage location having 128 bits to store a second matrix having K rows by two columns of data elements having the number of bits, and a second vector register having a 128-bit lane to store a third matrix having two rows by two columns of data elements having a greater number of bits. Execution circuitry is to perform operations for the instruction, including to generate and store a result matrix having two rows by two columns of result data elements having the greater number of bits in 128-bit lane of second vector register. The result matrix represents accumulation of the third matrix with product matrix generated from matrix multiplication using the first and second matrices.Type: ApplicationFiled: June 30, 2023Publication date: January 2, 2025Inventors: Alexander HEINECKE, Wing Shek WONG, Stephen ROBINSON, Raanan SADE, Amit GRADSTEIN, Simon RUBANOVICH, Michael ESPIG, Dan BAUM, Evangelos GEORGANAS, Dhiraj KALAMKAR
-
Publication number: 20250004765Abstract: Techniques for loading data with a hint related to data sharing with other cores. For example, one embodiment of an apparatus comprises: a plurality of cores to process instructions; a first core of the plurality of cores comprising: decoder circuitry to decode a single instruction, the single instruction having a first field for an opcode to indicate a load operation to read data from a memory, a second field to indicate a memory address for a location of the data in the memory, and a third field to store a value to indicate whether the data is expected to be shared between the first core and at least a second core of the plurality of cores; execution circuitry to execute the single instruction to read the data from the location in the memory; and cache controller circuitry to store the data in one or more caches in a state selected based on the value.Type: ApplicationFiled: June 30, 2023Publication date: January 2, 2025Inventors: Christopher J. HUGHES, Zhe WANG, Dan BAUM, Venkateswara Rao MADDURI, Alexander HEINECKE, Evangelos GEORGANAS, Chen DAN, Joseph NUZMAN
-
Patent number: 12182568Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (M,N) of the identified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the identified first source matrix by a corresponding nibble of a doubleword element (K,N) of the identified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element (M,N).Type: GrantFiled: August 14, 2023Date of Patent: December 31, 2024Assignee: Intel CorporationInventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall
-
Patent number: 12162480Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed herein that mitigate hard-braking events. An example apparatus at least one memory; instructions; and processor circuitry to execute the instructions to: determine a danger level associated with an object, the danger level indicative of a first measure of damage corresponding to a trajectory of the object compared to a trajectory of a vehicle; determine, based on the first danger level, a danger measure based on at least one of a position of the object, a velocity of the object, an acceleration of the object, a direction of travel of the object, a weight or mass of the object; and generate instructions to transmit to a steering system or a braking system of the vehicle based on the determination.Type: GrantFiled: February 2, 2023Date of Patent: December 10, 2024Assignee: INTEL CORPORATIONInventors: Alexander Heinecke, Sara Baghsorkhi, Justin Gottschlich, Mohammad Mejbah Ul Alam, Shengtian Zhou, Jeffrey Ota
-
Patent number: 12135968Abstract: Techniques for converting FP16 to BF8 using bias are described.Type: GrantFiled: December 26, 2020Date of Patent: November 5, 2024Assignee: Intel CorporationInventors: Alexander Heinecke, Naveen Mellempudi, Robert Valentine, Mark Charney, Christopher Hughes, Evangelos Georganas, Zeev Sperber, Amit Gradstein, Simon Rubanovich
-
Patent number: 12124847Abstract: Embodiments detailed herein relate to matrix operations. In particular, support for a matrix transpose instruction is detailed. In some embodiments, decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry to execute the decoded instruction to transpose each row of elements of the identified source matrix operand into a corresponding column of the identified destination matrix operand are detailed.Type: GrantFiled: July 1, 2017Date of Patent: October 22, 2024Assignee: Intel CorporationInventors: Robert Valentine, Dan Baum, Zeev Sperber, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Bret L Toll, Mark J. Charney, Barukh Ziv, Alexander Heinecke, Milind Girkar, Menachem Adelman, Simon Rubanovich
-
Publication number: 20240320001Abstract: Detailed herein are embodiment systems, processors, and methods for matrix move. For example, a processor comprising decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry to execute the decoded instruction to move each data element of the identified source matrix operand to corresponding data element position of the identified destination matrix operand is described.Type: ApplicationFiled: May 14, 2024Publication date: September 26, 2024Inventors: Robert VALENTINE, Zeev SPERBER, Mark J. CHARNEY, Bret L. TOLL, Jesus CORBAL, Dan BAUM, Alexander HEINECKE, Elmoustapha OULD-AHMED-VALL
-
Publication number: 20240248720Abstract: Techniques for converting FP16 data elements to BF8 data elements using a single instruction are described. An exemplary apparatus includes decoder circuitry to decode a single instruction, the single instruction to include a one or more fields to identify a source operand, one or more fields to identify a destination operand, and one or more fields for an opcode, the opcode to indicate that execution circuitry is to convert packed half-precision floating-point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions of the identified destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision floating-point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions.Type: ApplicationFiled: April 5, 2024Publication date: July 25, 2024Inventors: Alexander Heinecke, Naveen Mellempudi, Robert Valentine, Mark Charney, Christopher Hughes, Evangelos Georganas, Zeev Sperber, Amit Gradstein, Simon Rubanovich
-
Patent number: 12039332Abstract: Detailed herein are embodiment systems, processors, and methods for matrix move. For example, a processor comprising decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry to execute the decoded instruction to move each data element of the identified source matrix operand to corresponding data element position of the identified destination matrix operand is described.Type: GrantFiled: January 28, 2022Date of Patent: July 16, 2024Assignee: Intel CorporationInventors: Robert Valentine, Zeev Sperber, Mark J. Charney, Bret L. Toll, Jesus Corbal, Dan Baum, Alexander Heinecke, Elmoustapha Ould-Ahmed-Vall
-
Publication number: 20240211252Abstract: Embodiments detailed herein relate to arithmetic operations of float-point values. An exemplary processor includes decoding circuitry to decode an instruction, where the instruction specifies locations of a plurality of operands, values of which being in a floating-point format. The exemplary processor further includes execution circuitry to execute the decoded instruction, where the execution includes to: convert the values for each operand, each value being converted into a plurality of lower precision values, where an exponent is to be stored for each operand; perform arithmetic operations among lower precision values converted from values for the plurality of the operands; and generate a floating-point value by converting a resulting value from the arithmetic operations into the floating-point format and store the floating-point value.Type: ApplicationFiled: January 2, 2024Publication date: June 27, 2024Inventors: Gregory HENRY, Alexander HEINECKE
-
Patent number: 12020000Abstract: Systems and methods include arithmetic circuitry that generates a floating-point mantissa and includes a propagation network that calculates the floating-point mantissa based on input bits. The systems and methods also include rounding circuitry that rounds the floating-point mantissa. The rounding circuitry includes a multiplexer at a rounding location for the floating-point mantissa that selectively inputs a first input bit of the input bits or a rounding bit. The rounding circuitry also includes an OR gate that ORs a second input bit of the input bits with the rounding bit. Moreover, the second input bit is a less significant bit than the first input bit.Type: GrantFiled: December 24, 2020Date of Patent: June 25, 2024Assignee: Intel CorporationInventors: Martin Langhammer, Alexander Heinecke
-
Publication number: 20240184585Abstract: Techniques for comparing BF16 data elements are described. An exemplary BF16 comparison instruction includes fields for an opcode, an identification of a location of a first packed data source operand, and an identification of a location of a second packed data source operand, wherein the opcode is to indicate that execution circuitry is to perform, for a particular data element position of the packed data source operands, a comparison of a data element at that position, and update a flags register based on the comparison.Type: ApplicationFiled: February 8, 2024Publication date: June 6, 2024Inventors: Alexander HEINECKE, Menachem ADELMAN, Robert VALENTINE, Zeev SPERBER, Amit GRADSTEIN, Mark CHARNEY, Evangelos GEORGANAS, Dhiraj KALAMKAR, Christopher HUGHES, Cristina ANDERSON
-
Publication number: 20240143325Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address, and execution circuitry to execute the decoded instruction to store configuration information about usage of storage for two-dimensional data structures at the memory address.Type: ApplicationFiled: November 3, 2023Publication date: May 2, 2024Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
-
Publication number: 20240143328Abstract: Embodiments detailed herein relate to systems and methods to store a tile register pair to memory. In one example, a processor includes: decode circuitry to decode a store matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded store matrix pair instruction to store every element of left and right tiles of the identified source matrix to corresponding element positions of left and right tiles of the identified destination matrix, respectively, wherein the executing stores a chunk of C elements of one row of the identified source matrix at a time.Type: ApplicationFiled: November 2, 2023Publication date: May 2, 2024Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
-
Publication number: 20240134644Abstract: Embodiments detailed herein relate to matrix operations. In particular, support for matrix (tile) addition, subtraction, and multiplication is described. For example, circuitry to support instructions for element-by-element matrix (tile) addition, subtraction, and multiplication are detailed. In some embodiments, for matrix (tile) addition, decode circuitry is to decode an instruction having fields for an opcode, a first source matrix operand identifier, a second source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry is to execute the decoded instruction to, for each data element position of the identified first source matrix operand: add a first data value at that data element position to a second data value at a corresponding data element position of the identified second source matrix operand, and store a result of the addition into a corresponding data element position of the identified destination matrix operand.Type: ApplicationFiled: December 29, 2023Publication date: April 25, 2024Applicant: Intel CorporationInventors: Robert VALENTINE, Dan BAUM, Zeev SPERBER, Jesus CORBAL, Elmoustapha OULD-AHMED-VALL, Bret L. TOLL, Mark J. CHARNEY, Barukh ZIV, Alexander HEINECKE, Milind GIRKAR, Simon RUBANOVICH
-
Publication number: 20240045654Abstract: Techniques for performing arithmetic operations on FP8 values are described. An exemplary instruction includes fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand, and an identification of location of a packed data destination operand, wherein the opcode is to indicate an arithmetic operation execution circuitry is to perform, for each data element position of the identified packed data source operands, the arithmetic operation on FP8 data elements in that data element position in FP8 format and store a result of each arithmetic operation into a corresponding data element position of the identified packed data destination operand.Type: ApplicationFiled: October 1, 2022Publication date: February 8, 2024Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
-
Publication number: 20240045682Abstract: Techniques for scale and reduction of FP8 data elements are described. An exemplary instruction includes fields for an having fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operands, a floating point scale operation of a FP8 data element of the first packed data source by multiplying the data element by a power of 2 value, wherein a value of the exponent of the power of 2 value is a floor value of a FP8 data element of the second packed data source, and store a result of the floating point scale operation into a corresponding data element position of the packed data destination operand.Type: ApplicationFiled: October 1, 2022Publication date: February 8, 2024Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
-
Publication number: 20240045688Abstract: Techniques for performing FP8 FMA in response to an instruction are described. In some examples, an instruction has fields for an opcode, an identification of location of a packed data source/destination operand (a first source), an identification of a location of a second packed data source operand, an identification of a location of a third packed data source operand, and an identification of location of a packed data source/destination operand, wherein the opcode is to indicate operand ordering and that execution circuitry is to, per data element position, perform a FP8 value fused multiply-accumulate operation using the first, second, and third source operands and store a result in a corresponding data element position of the source/destination operand, wherein the FP8 value has an 8-bit floating point format that comprises one bit for a sign, at least 4 bits for an exponent, and at least two bits for a fraction.Type: ApplicationFiled: October 1, 2022Publication date: February 8, 2024Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
-
Publication number: 20240045686Abstract: Techniques for converting FP8 data elements to FP16 or FP32 data elements using a single instruction are described. An example apparatus includes decoder circuitry to decode a single instruction, the single instruction to indicate that execution circuitry is to convert packed FP8 data from the identified source to packed half-precision floating-point data or single-precision floating point data and store the packed half-precision floating-point data or single-precision floating point data into corresponding data element positions of the identified destination operand.Type: ApplicationFiled: October 1, 2022Publication date: February 8, 2024Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
-
Publication number: 20240045684Abstract: Techniques for converting FP16 to BF8 using bias are described.Type: ApplicationFiled: October 1, 2022Publication date: February 8, 2024Inventors: Alexander Heinecke, Menachem Adelman, Mark Charney, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber, Robert Valentine