Patents by Inventor Michael Espig

Michael Espig has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240220248
    Abstract: Techniques to restrict vector length in a processor are described. A method of an aspect that may be performed by a processor includes executing first instances of vector instructions having respective opcode values regardless of whether they specify wider vectors of a wider vector width or narrower vectors of a narrower vector width, when a control value is a first value. The method also includes executing second instances of vector instructions having the respective opcode values when they specify narrower vectors of the narrower vector width, but do not specify wider vectors of the wider vector width, when the control value is a second different value. The method also includes preventing execution of third instances of vector instructions having the respective opcode values when they specify wider vectors of the wider vector width, when the control value is the second value. Other methods, processors, and systems are disclosed.
    Type: Application
    Filed: December 29, 2022
    Publication date: July 4, 2024
    Inventors: Vivekananthan SANJEEPAN, Gilbert NEIGER, Michael ESPIG
  • Publication number: 20240103865
    Abstract: Techniques for using and/or supporting multiplication with add and/or subtract instructions with an intermediate (after multiplication) round are described. In some examples, an instruction at least having one or more fields for an opcode and location information for three packed data source operands, wherein the opcode is to indicate execution circuitry is to perform, per packed data element position, a multiplication, a round, addition and/or subtraction, and a round, using the three packed data source operands and storage into a corresponding packed data element position of an identified destination location, wherein which packed data element positions are to be added and subtracted is defined by the opcode is supported.
    Type: Application
    Filed: March 30, 2023
    Publication date: March 28, 2024
    Inventors: Michael ESPIG, Mikko BYCKLING, Maxim LOKTYUKHIN, Dmitry Yurievich BABOKIN, Amit GRADSTEIN, Deepti AGGARWAL
  • Publication number: 20240103872
    Abstract: Techniques for performing floating-point to integer conversion with saturation are described. In some examples, an instruction is executed to perform the conversion. In some examples, a single instruction to include at least one or more fields for an opcode and one or more fields for location information for at least a first source operand and a destination operand, wherein the opcode is to indicate execution circuitry is to convert, using truncation or saturation, each floating-point data element of at least the first source operand to an integer value and store the integer value into a corresponding data element position of the destination operand, wherein truncation is to be used when a conversion is inexact and saturation is to be used when a conversion overflows.
    Type: Application
    Filed: March 29, 2023
    Publication date: March 28, 2024
    Inventors: John MORGAN, Deepti AGGARWAL, Michael ESPIG
  • Publication number: 20240103866
    Abstract: Detailed herein are examples of instructions and their hardware support for floating-point comparison that makes use of the distinction between signed integer comparison and unsigned integer comparison to make an analogous distinction between floating-point relationships including unordered and those that do not. These instructions may reduce the number of instructions required to compare and conditionally execute operations in a program, including instructions to load values and instructions to explicitly test for the unordered condition.
    Type: Application
    Filed: July 1, 2023
    Publication date: March 28, 2024
    Inventors: John MORGAN, Deepti AGGARWAL, Michael ESPIG, H. Peter ANVIN
  • Publication number: 20240078285
    Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.
    Type: Application
    Filed: November 6, 2023
    Publication date: March 7, 2024
    Inventors: Dan BAUM, Chen KOREN, Elmoustapha OULD-AHMED-VALL, Michael ESPIG, Christopher J. HUGHES, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
  • Publication number: 20240045690
    Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
    Type: Application
    Filed: September 1, 2023
    Publication date: February 8, 2024
    Inventors: Dan BAUM, Michael ESPIG, James GUILFORD, Wajdi K. FEGHALI, Raanan SADE, Christopher J. HUGHES, Robert VALENTINE, Bret TOLL, Elmoustapha OULD-AHMED-VALL, Mark J. CHARNEY, Vinodh GOPAL, Ronen ZOHAR, Alexander F. HEINECKE
  • Patent number: 11886875
    Abstract: Disclosed embodiments relate to systems and methods for performing nibble-sized operations on matrix elements. In one example, a processor includes fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction the fetched instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode to indicate the processor is to, for each pair of corresponding elements of the first and second source matrices, logically partition each element into nibble-sized partitions, perform an operation indicated by the instruction on each partition, and store execution results to a corresponding nibble-sized partition of a corresponding element of the destination matrix. The exemplary processor includes execution circuitry to execute the decoded instruction as per the opcode.
    Type: Grant
    Filed: December 26, 2018
    Date of Patent: January 30, 2024
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Jonathan D. Pearce, Dan Baum, Guei-Yuan Lueh, Michael Espig, Christopher J. Hughes, Raanan Sade, Robert Valentine, Mark J. Charney, Alexander F. Heinecke
  • Publication number: 20240004662
    Abstract: Techniques for performing horizontal reductions are described. In some examples, an instance of a horizontal instruction is to include at least one field for an opcode, one or more fields to reference a first source operand, and one or more fields to reference a destination operand, wherein the opcode is to indicate that execution circuitry is, in response to a decoded instance of the single instruction, to at least perform a horizontal reduction using at least one data element of a non-masked data element position of at least the first source operand and store a result of the horizontal reduction in the destination operand.
    Type: Application
    Filed: July 2, 2022
    Publication date: January 4, 2024
    Inventors: Menachem ADELMAN, Amit GRADSTEIN, Regev SHEMY, Chitra NATARAJAN, Leonardo BORGES, Chytra SHIVASWAMY, Igor ERMOLAEV, Michael ESPIG, Or BEIT AHARON, Jeff WIEDEMEIER
  • Publication number: 20240004648
    Abstract: Techniques for vector unpacking are described. In some examples a single instruction is executed to perform vector unpacking.
    Type: Application
    Filed: July 2, 2022
    Publication date: January 4, 2024
    Inventors: Venkateswara Rao MADDURI, Jason BRANDT, Jeff WIEDEMEIER, Michael ESPIG
  • Patent number: 11847185
    Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.
    Type: Grant
    Filed: September 24, 2021
    Date of Patent: December 19, 2023
    Assignee: Intel Corporation
    Inventors: Dan Baum, Chen Koren, Elmoustapha Ould-Ahmed-Vall, Michael Espig, Christopher J. Hughes, Raanan Sade, Robert Valentine, Mark J. Charney, Alexander F. Heinecke
  • Patent number: 11836464
    Abstract: An apparatus and method for efficiently performing a multiply add or multiply accumulate operation. For example, one embodiment of a processor comprises: a decoder to decode an instruction specifying an operation, the instruction comprising a first operand identifying a multiplier and a second operand identifying a multiplicand; and fused multiply-add (FMA) execution circuitry comprising first multiplication circuitry to perform a multiplication using the multiplicand and multiplier to generate a result for multipliers and multiplicands falling within a first precision range, and second multiplication circuitry to be used instead of the first multiplication circuitry for multipliers and multiplicands falling within a second precision range.
    Type: Grant
    Filed: June 14, 2022
    Date of Patent: December 5, 2023
    Assignee: Intel Corporation
    Inventors: Aditya Varma, Michael Espig
  • Patent number: 11748103
    Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
    Type: Grant
    Filed: February 15, 2022
    Date of Patent: September 5, 2023
    Assignee: Intel Corporation
    Inventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
  • Publication number: 20230185873
    Abstract: Methods and apparatus relating to separable convolution filter operations on matrix multiplication arrays are described. In an embodiment, logic circuitry generates a first convolution kernel and a second convolution kernel based on a two-dimensional convolution kernel. A matrix processing array comprising a plurality of Fused Multiply-Add (FMA) blocks applies the first convolution kernel to input data during a first pass to generate an intermediate data and the matrix processing array applies the second convolution kernel to the intermediate data to generate output data. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: December 10, 2021
    Publication date: June 15, 2023
    Applicant: Intel Corporation
    Inventors: Michael Espig, Deepti Aggarwal
  • Publication number: 20220413853
    Abstract: Systems, methods, and apparatuses to support packed data convolution instructions with shift control and width control are described.
    Type: Application
    Filed: June 25, 2021
    Publication date: December 29, 2022
    Inventors: DEEPTI AGGARWAL, MICHAEL ESPIG, ROBERT VALENTINE, SUMIT MOHAN, PRAKARAM JOSHI, RICHARD WINTERTON
  • Patent number: 11507376
    Abstract: Disclosed embodiments relate to instructions for fast element unpacking. In one example, a processor includes fetch circuitry to fetch an instruction whose format includes fields to specify an opcode and locations of an Array-of-Structures (AOS) source matrix and one or more Structure of Arrays (SOA) destination matrices, wherein: the specified opcode calls for unpacking elements of the specified AOS source matrix into the specified Structure of Arrays (SOA) destination matrices, the AOS source matrix is to contain N structures each containing K elements of different types, with same-typed elements in consecutive structures separated by a stride, the SOA destination matrices together contain K segregated groups, each containing N same-typed elements, decode circuitry to decode the fetched instruction, and execution circuitry, responsive to the decoded instruction, to unpack each element of the specified AOS matrix into one of the K element types of the one or more SOA matrices.
    Type: Grant
    Filed: January 19, 2021
    Date of Patent: November 22, 2022
    Assignee: Intel Corporation
    Inventors: Bret Toll, Alexander F. Heinecke, Christopher J. Hughes, Ronen Zohar, Michael Espig, Dan Baum, Raanan Sade, Robert Valentine, Mark J. Charney, Elmoustapha Ould-Ahmed-Vall
  • Publication number: 20220365751
    Abstract: An embodiment of an apparatus comprises one or more fractional width fused multiply-accumulate (FMA) circuits configured as a shared Wallace tree, and circuitry coupled to the one or more fractional width FMA circuits to provide one or more fractional width FMA operations through the one or more fractional width FMA circuits. Other embodiments are disclosed and claimed.
    Type: Application
    Filed: June 25, 2021
    Publication date: November 17, 2022
    Applicant: Intel Corporation
    Inventors: Aditya Varma, Mahesh Kumashikar, Michael Espig
  • Publication number: 20220342641
    Abstract: An apparatus and method for efficiently performing a multiply add or multiply accumulate operation. For example, one embodiment of a processor comprises: a decoder to decode an instruction specifying an operation, the instruction comprising a first operand identifying a multiplier and a second operand identifying a multiplicand; and fused multiply-add (FMA) execution circuitry comprising first multiplication circuitry to perform a multiplication using the multiplicand and multiplier to generate a result for multipliers and multiplicands falling within a first precision range, and second multiplication circuitry to be used instead of the first multiplication circuitry for multipliers and multiplicands falling within a second precision range.
    Type: Application
    Filed: June 14, 2022
    Publication date: October 27, 2022
    Inventors: Aditya Varma, Michael Espig
  • Publication number: 20220308881
    Abstract: In an embodiment, a processor includes: a fetch circuit to fetch instructions, the instructions including a sum of absolute differences (SAD) instruction; a decode circuit to decode the SAD instruction; and an execution circuit to, during an execution of the decoded SAD instruction, generate an SAD output vector based on a plurality of input vectors, the SAD output vector including a plurality of absolute differences values. Other embodiments are described and claimed.
    Type: Application
    Filed: March 26, 2021
    Publication date: September 29, 2022
    Inventors: Deepti Aggarwal, Michael Espig, Robert Valentine, Mark Charney
  • Publication number: 20220197635
    Abstract: In an embodiment, a processor includes: a fetch circuit to fetch instructions, the instructions including a sum of squared differences (SSD) instruction; a decode circuit to decode the SSD instruction; and an execution circuit to, during an execution of the decoded SSD instruction, generate an SSD output vector based on a plurality of input vectors, the SSD output vector including a plurality of squared differences values. Other embodiments are described and claimed.
    Type: Application
    Filed: December 23, 2020
    Publication date: June 23, 2022
    Inventors: Deepti AGGARWAL, Michael ESPIG, Chekib NOUIRA, Robert VALENTINE, Mark CHARNEY
  • Patent number: 11366636
    Abstract: An apparatus and method for efficiently performing a multiply add or multiply accumulate operation. For example, one embodiment of a processor comprises: a decoder to decode an instruction specifying an operation, the instruction comprising a first operand identifying a multiplier and a second operand identifying a multiplicand; and fused multiply-add (FMA) execution circuitry comprising first multiplication circuitry to perform a multiplication using the multiplicand and multiplier to generate a result for multipliers and multiplicands falling within a first precision range, and second multiplication circuitry to be used instead of the first multiplication circuitry for multipliers and multiplicands falling within a second precision range.
    Type: Grant
    Filed: July 1, 2020
    Date of Patent: June 21, 2022
    Assignee: INTEL CORPORATION
    Inventors: Aditya Varma, Michael Espig