Patents by Inventor Dan Baum

Dan Baum has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12287843
    Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.
    Type: Grant
    Filed: November 6, 2023
    Date of Patent: April 29, 2025
    Assignee: Intel Corporation
    Inventors: Dan Baum, Chen Koren, Elmoustapha Ould-Ahmed-Vall, Michael Espig, Christopher J. Hughes, Raanan Sade, Robert Valentine, Mark J. Charney, Alexander F. Heinecke
  • Patent number: 12282773
    Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.
    Type: Grant
    Filed: December 8, 2023
    Date of Patent: April 22, 2025
    Assignee: Intel Corporation
    Inventors: Menachem Adelman, Robert Valentine, Zeev Sperber, Mark J. Charney, Bret L. Toll, Rinat Rappoport, Jesus Corbal, Dan Baum, Alexander F. Heinecke, Elmoustapha Ould-Ahmed-Vall, Yuri Gebil, Raanan Sade
  • Patent number: 12277234
    Abstract: A processor, a system, a machine readable medium, and a method.
    Type: Grant
    Filed: December 26, 2020
    Date of Patent: April 15, 2025
    Assignee: Intel Corporation
    Inventors: David M. Durham, Michael D. LeMay, Salmin Sultana, Karanvir S. Grewal, Michael E. Kounavis, Sergej Deutsch, Andrew James Weiler, Abhishek Basak, Dan Baum, Santosh Ghosh
  • Publication number: 20250117221
    Abstract: Embodiments detailed herein relate to matrix operations. In particular, support for a matrix transpose instruction is detailed. In some embodiments, decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry to execute the decoded instruction to transpose each row of elements of the identified source matrix operand into a corresponding column of the identified destination matrix operand are detailed.
    Type: Application
    Filed: October 18, 2024
    Publication date: April 10, 2025
    Inventors: Robert VALENTINE, Dan BAUM, Zeev SPERBER, Jesus CORBAL, Elmoustapha OULD-AHMED-VALL, Bret L. TOLL, Mark J. CHARNEY, Barukh ZIV, Alexander HEINECKE, Milind GIRKAR, Menachem ADELMAN, Simon RUBANOVICH
  • Publication number: 20250117222
    Abstract: Embodiments detailed herein relate to matrix operations. In particular, matrix (tile) multiply accumulate and negated matrix (tile) multiply accumulate are discussed. For example, in some embodiments decode circuitry to decode an instruction having fields for an opcode, an identifier for a first source matrix operand, an identifier of a second source matrix operand, and an identifier for a source/destination matrix operand; and execution circuitry to execute the decoded instruction to multiply the identified first source matrix operand by the identified second source matrix operand, add a result of the multiplication to the identified source/destination matrix operand, and store a result of the addition in the identified source/destination matrix operand and zero unconfigured columns of identified source/destination matrix operand are detailed.
    Type: Application
    Filed: October 29, 2024
    Publication date: April 10, 2025
    Inventors: Robert VALENTINE, Zeev SPERBER, Mark J. CHARNEY, Bret L. TOLL, Rinat RAPPOPORT, Stanislav SHWARTSMAN, Dan BAUM, Igor YANOVER, Elmoustapha OULD-AHMED-VALL, Menachem ADELMAN, Jesus CORBAL, Yuri GEBIL, Simon RUBANOVICH
  • Patent number: 12265826
    Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.
    Type: Grant
    Filed: December 28, 2023
    Date of Patent: April 1, 2025
    Assignee: Intel Corporation
    Inventors: Bret Toll, Christopher J. Hughes, Dan Baum, Elmoustapha Ould-Ahmed-Vall, Raanan Sade, Robert Valentine, Mark J. Charney, Alexander F. Heinecke
  • Patent number: 12260213
    Abstract: Embodiments detailed herein relate to matrix operations. In particular, support for matrix (tile) addition, subtraction, and multiplication is described. For example, circuitry to support instructions for element-by-element matrix (tile) addition, subtraction, and multiplication are detailed. In some embodiments, for matrix (tile) addition, decode circuitry is to decode an instruction having fields for an opcode, a first source matrix operand identifier, a second source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry is to execute the decoded instruction to, for each data element position of the identified first source matrix operand: add a first data value at that data element position to a second data value at a corresponding data element position of the identified second source matrix operand, and store a result of the addition into a corresponding data element position of the identified destination matrix operand.
    Type: Grant
    Filed: December 10, 2021
    Date of Patent: March 25, 2025
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Dan Baum, Zeev Sperber, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Bret L. Toll, Mark J. Charney, Barukh Ziv, Alexander Heinecke, Milind Girkar, Simon Rubanovich
  • Publication number: 20250068424
    Abstract: Methods, apparatus, and computer programs are disclosed for data/instruction access based on performance hints. In some embodiments, a method comprises decoding an instruction to access data or code by a core of a computer processor, the instruction to provide one or more hints on how the data or code is to be processed through a cache hierarchy of the computer processor based on the instruction, the one or more hints indicating which level of the cache hierarchy or which cache in a level of the cache hierarchy to load or store the data or code, a priority of the data or code in a cache, or how the data or code is to be shared among multiple cores of the computer processor. The method further comprises processing the data or code based on the one or more hints responsive to the decoded instruction.
    Type: Application
    Filed: November 8, 2024
    Publication date: February 27, 2025
    Inventors: Duane GALBI, Christopher J. HUGHES, Dan BAUM
  • Publication number: 20250068422
    Abstract: Methods, apparatus, and computer programs are disclosed for context switching. In some embodiments, a method comprises dedicating a first subset of a plurality of vector registers to a first thread of a plurality of threads for thread execution; and responsive to a context switch from the first thread to a second thread, bypassing saving a state of the first subset of the plurality of vector registers; and saving a state of a second subset of the plurality of vector registers, wherein the second subset of the plurality of vector registers is not dedicated to the first thread, and wherein the first and second subsets are mutually exclusive.
    Type: Application
    Filed: November 8, 2024
    Publication date: February 27, 2025
    Inventors: Duane GALBI, Christopher J. HUGHES, Dan BAUM, H. Peter ANVIN, Stijn EYERMAN
  • Publication number: 20250004716
    Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in the form of decode circuitry to decode an instruction having fields for an opcode, a destination matrix operand identifier, and source memory information, and execution circuitry to execute the decoded instruction to load groups of strided data elements from memory into configured rows of the identified destination matrix operand to memory.
    Type: Application
    Filed: May 3, 2024
    Publication date: January 2, 2025
    Inventors: Robert VALENTINE, Menachem ADELMAN, Milind B. GIRKAR, Zeev SPERBER, Mark J. CHARNEY, Bret L. TOLL, Rinat RAPPOPORT, Jesus Corbal, Stanislav SHWARTSMAN, Dan BAUM, Igor YANOVER, Alexander F. HEINECKE, Barukh ZIV, Elmoustapha OULD-AHMED-VALL, Yuri GEBIL, Raanan SADE
  • Publication number: 20250004721
    Abstract: A method of an aspect includes multiplying pairs of corresponding small-exponent floating-point data elements to generate corresponding small-exponent floating-point products. The small-exponent floating-point data elements and the small-exponent floating-point products each have no more than six exponent bits. The method also includes converting the small-exponent floating-point products to signed fixed-point products and accumulating the signed fixed-point products, and an optional signed fixed-point accumulation value, by fixed-point addition to generate a signed fixed-point accumulation value. Other methods, processors, systems, and instructions are disclosed.
    Type: Application
    Filed: September 10, 2024
    Publication date: January 2, 2025
    Inventors: Simon Rubanovich, Amit Gradstein, Sagi Meller, Uri Reuven Tassa, Dan Baum
  • Publication number: 20250004765
    Abstract: Techniques for loading data with a hint related to data sharing with other cores. For example, one embodiment of an apparatus comprises: a plurality of cores to process instructions; a first core of the plurality of cores comprising: decoder circuitry to decode a single instruction, the single instruction having a first field for an opcode to indicate a load operation to read data from a memory, a second field to indicate a memory address for a location of the data in the memory, and a third field to store a value to indicate whether the data is expected to be shared between the first core and at least a second core of the plurality of cores; execution circuitry to execute the single instruction to read the data from the location in the memory; and cache controller circuitry to store the data in one or more caches in a state selected based on the value.
    Type: Application
    Filed: June 30, 2023
    Publication date: January 2, 2025
    Inventors: Christopher J. HUGHES, Zhe WANG, Dan BAUM, Venkateswara Rao MADDURI, Alexander HEINECKE, Evangelos GEORGANAS, Chen DAN, Joseph NUZMAN
  • Publication number: 20250004768
    Abstract: Decoder circuitry to decode an instruction indicating a first vector register having a 128-bit lane to store a first matrix having two rows by K columns of data elements having a number of bits, a storage location having 128 bits to store a second matrix having K rows by two columns of data elements having the number of bits, and a second vector register having a 128-bit lane to store a third matrix having two rows by two columns of data elements having a greater number of bits. Execution circuitry is to perform operations for the instruction, including to generate and store a result matrix having two rows by two columns of result data elements having the greater number of bits in 128-bit lane of second vector register. The result matrix represents accumulation of the third matrix with product matrix generated from matrix multiplication using the first and second matrices.
    Type: Application
    Filed: June 30, 2023
    Publication date: January 2, 2025
    Inventors: Alexander HEINECKE, Wing Shek WONG, Stephen ROBINSON, Raanan SADE, Amit GRADSTEIN, Simon RUBANOVICH, Michael ESPIG, Dan BAUM, Evangelos GEORGANAS, Dhiraj KALAMKAR
  • Publication number: 20250004773
    Abstract: An apparatus and method are described for prefetching data with hints. For example, one embodiment of a processor comprises: a plurality of cores to process instructions; a first core of the plurality of cores comprising: decoder circuitry to decode instructions indicating memory operations including load operations of a first type with shared data hints and load operations of a second type without shared data hints; execution circuitry to execute the instructions to perform the memory operations; data prefetch circuitry to store tracking data in a tracking data structure responsive to the memory operations, a portion of the tracking data associated with the first type of load operations; and the data prefetch circuitry to detect memory access patterns using the tracking data, the data prefetch circuitry to responsively issue one or more prefetch operations using shared data hints based, at least in part, on the portion of the tracking data associated with the first type of load operations.
    Type: Application
    Filed: June 30, 2023
    Publication date: January 2, 2025
    Inventors: Christopher J. HUGHES, Zhe WANG, Dan BAUM, Venkateswara Rao MADDURI, Chen DAN, Joseph NUZMAN
  • Patent number: 12182571
    Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in the form of decode circuitry to decode an instruction having fields for an opcode, a destination matrix operand identifier, and source memory information, and execution circuitry to execute the decoded instruction to load groups of strided data elements from memory into configured rows of the identified destination matrix operand to memory.
    Type: Grant
    Filed: January 23, 2023
    Date of Patent: December 31, 2024
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Menachem Adelman, Milind B. Girkar, Zeev Sperber, Mark J. Charney, Bret L. Toll, Rinat Rappoport, Jesus Corbal, Stanislav Shwartsman, Dan Baum, Igor Yanover, Alexander F. Heinecke, Barukh Ziv, Elmoustapha Ould-Ahmed-Vall, Yuri Gebil, Raanan Sade
  • Patent number: 12175246
    Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
    Type: Grant
    Filed: September 1, 2023
    Date of Patent: December 24, 2024
    Assignee: Intel Corporation
    Inventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
  • Patent number: 12147804
    Abstract: Embodiments detailed herein relate to matrix operations. In particular, matrix (tile) multiply accumulate and negated matrix (tile) multiply accumulate are discussed. For example, in some embodiments decode circuitry to decode an instruction having fields for an opcode, an identifier for a first source matrix operand, an identifier of a second source matrix operand, and an identifier for a source/destination matrix operand; and execution circuitry to execute the decoded instruction to multiply the identified first source matrix operand by the identified second source matrix operand, add a result of the multiplication to the identified source/destination matrix operand, and store a result of the addition in the identified source/destination matrix operand and zero unconfigured columns of identified source/destination matrix operand are detailed.
    Type: Grant
    Filed: July 22, 2021
    Date of Patent: November 19, 2024
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Zeev Sperber, Mark J. Charney, Bret L. Toll, Rinat Rappoport, Stanislav Shwartsman, Dan Baum, Igor Yanover, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman, Jesus Corbal, Yuri Gebil, Simon Rubanovich
  • Publication number: 20240354108
    Abstract: Techniques for implementing instructions and modified instruction encodings for checking tags and for interspersing islands of tags in line with bucketed data for locality by a processor are described. In an example, an apparatus includes decoder circuitry and execution circuitry. The decoder circuitry is to decode an instruction into a decoded instruction. The instruction has an opcode to indicate that the execution circuitry is to use metadata and instruction encodings to selectively perform a memory safety check. The execution circuitry is to execute the decoded instruction according to the opcode.
    Type: Application
    Filed: September 29, 2023
    Publication date: October 24, 2024
    Applicant: Intel Corporation
    Inventors: Michael LeMay, David M. Durham, Joseph Cihula, Joseph Nuzman, Dan Baum, Jonathan Combs
  • Patent number: 12124847
    Abstract: Embodiments detailed herein relate to matrix operations. In particular, support for a matrix transpose instruction is detailed. In some embodiments, decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry to execute the decoded instruction to transpose each row of elements of the identified source matrix operand into a corresponding column of the identified destination matrix operand are detailed.
    Type: Grant
    Filed: July 1, 2017
    Date of Patent: October 22, 2024
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Dan Baum, Zeev Sperber, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Bret L Toll, Mark J. Charney, Barukh Ziv, Alexander Heinecke, Milind Girkar, Menachem Adelman, Simon Rubanovich
  • Publication number: 20240329994
    Abstract: Techniques for converting floating-point to integer are described. An example of an instruction to perform such a conversion includes fields for an opcode, an identification of location of a packed data source operand, an identification of location of a packed data destination operand, an indication of a location in each packed data element of the packed data destination to store an 8-bit integer (INT8) value, wherein the opcode is to indicate to conversion circuitry is to downconvert data of each packed data element of the packed data source operand to an INT8 value and make available for storage the INT8 value in the identified location of a corresponding packed data element of the packed data destination.
    Type: Application
    Filed: March 30, 2023
    Publication date: October 3, 2024
    Inventors: Uri Sherman, Dan Baum, Menachem Adelman, Amit Gradstein