Patents by Inventor Martin Langhammer

Martin Langhammer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240126558
    Abstract: A processor includes a shared memory, and an instruction unit to receive a single instruction, multiple thread (SIMT) instruction having a first source register identifier and a second source register identifier. The SIMT instruction indicates a number of data values to be written to the shared memory concurrently. A SIMT processor includes processor elements each to execute instructions of a different corresponding thread of a parallel thread group. Each of a number of processor elements, equal in number to the number of data values, is to execute the SIMT instruction to concurrently write a different corresponding one of the number of data values from a first source register of the respective processor element identified by the first source register identifier to the shared memory at an address based on address information from a second source register of the respective processor element identified by the second source register identifier.
    Type: Application
    Filed: December 20, 2023
    Publication date: April 18, 2024
    Inventor: Martin LANGHAMMER
  • Patent number: 11960853
    Abstract: Folded integer multiplier (FIM) circuitry includes a multiplier configurable to perform multiplication and a first addition/subtraction unit and a second addition/subtraction unit both configurable to perform addition and subtraction. The FIM circuitry is configurable to determine each product of a plurality of products for a plurality of pairs of input values having a first number of bits by performing, using the first and second addition/subtraction units, a plurality of operations involving addition or subtraction, and performing, using the multiplier, a plurality of multiplication operations involving values having fewer bits than the first number of bits. The plurality of multiplication operations includes a first number of multiplication operations, and the multiplier is configurable to begin performing all multiplication operations of the plurality of multiplication operations within a first number of clock cycles equal to the first number of multiplication operations.
    Type: Grant
    Filed: March 26, 2021
    Date of Patent: April 16, 2024
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Bogdan Mihai Pasca
  • Publication number: 20240118870
    Abstract: Integrated circuit devices, methods, and circuitry for a digital signal processing (DSP) block that can selectively perform higher-precision DSP multiplication operations or lower-precision AI tensor multiplication operations. Flexible digital signal processing circuitry may include hardened multipliers, hardened summation circuitry, and an intermediate multiplexer network. The intermediate multiplexer network may be configurable to, in a first configuration, route data between the plurality of hardened multipliers and the hardened summation circuitry to perform a plurality of lower-precision multiplication operations. In a second configuration, the intermediate multiplexer network may route the data between the plurality of hardened multipliers and the hardened summation circuitry to perform at least one higher-precision multiplication operation.
    Type: Application
    Filed: September 27, 2023
    Publication date: April 11, 2024
    Inventor: Martin Langhammer
  • Publication number: 20240113699
    Abstract: Integrated circuit devices, methods, and circuitry for implementing and using a flexible circuit for real and complex filter operations are provided. An integrated circuit may include programmable logic circuitry and digital signal processor (DSP) blocks. The DSP blocks may be configurable to receive inputs from the programmable logic circuitry and may include first and second multiplier pairs. The first multiplier pair may include a first multiplier that may receive a first input and a second input and a second multiplier that may receive the second input and a third input of the inputs. The second multiplier pair may include a third multiplier that may receive the first input or a fourth input and a fifth input and a fourth multiplier that may receive the third input or a fifth input and a sixth input.
    Type: Application
    Filed: September 30, 2022
    Publication date: April 4, 2024
    Inventor: Martin Langhammer
  • Publication number: 20240078211
    Abstract: An accelerated processor structure on a programmable integrated circuit device includes a processor and a plurality of configurable digital signal processors (DSPs). Each configurable DSP includes a circuit block, which in turn includes a plurality of multipliers. The accelerated processor structure further includes a first bus to transfer data from the processor to the configurable DSPs, and a second bus to transfer data from the configurable DSPs to the processor.
    Type: Application
    Filed: September 14, 2023
    Publication date: March 7, 2024
    Inventors: David Shippy, Martin Langhammer, Jeffrey Eastlack
  • Patent number: 11907719
    Abstract: The present disclosure describes a digital signal processing (DSP) block that includes a plurality of columns of weight registers and a plurality of inputs configured to receive a first plurality of values and a second plurality of values. The first plurality of values is stored in the plurality of columns of weight registers after being received. Additionally, the DSP block includes a plurality of multipliers configured to simultaneously multiply each value of the first plurality of values by each value of the second plurality of values.
    Type: Grant
    Filed: June 26, 2020
    Date of Patent: February 20, 2024
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Dongdong Chen, Jason R. Bergendahl
  • Patent number: 11899746
    Abstract: The present disclosure relates generally to techniques for efficiently performing operations associated with artificial intelligence (AI), machine learning (ML), and/or deep learning (DL) applications, such as training and/or interference calculations, using an integrated circuit device. More specifically, the present disclosure relates to an integrated circuit design implemented to perform these operations with low latency and/or a high bandwidth of data. For example, embodiments of a computationally dense digital signal processing (DSP) circuitry, implemented to efficiently perform one or more arithmetic operations (e.g., a dot-product) on an input are disclosed. Moreover, embodiments described herein may relate to layout, design, and data scheduling of a processing element array implemented to compute matrix multiplications (e.g., systolic array multiplication).
    Type: Grant
    Filed: December 23, 2021
    Date of Patent: February 13, 2024
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Andrei-Mihai Hagiescu-Miriste
  • Publication number: 20230367547
    Abstract: A first storage location is to store a first floating-point data element. The first data element has a sign bit, an N-bit first exponent value, and M bits. A second storage location is to store a second floating-point data element that is to have a same number of bits as the first floating-point data element. The second data element has a sign bit, an N-bit first exponent value, and M bits. The N-bit first exponent value of the second data element is all zeroes and the M bits of the second data element include a significand and a second exponent value. A floating-point arithmetic unit is coupled with the first and second storage locations. The floating-point arithmetic unit is to perform either multiplication or addition on the first and second data elements to generate a result data element based at least in part on the second exponent value.
    Type: Application
    Filed: June 15, 2023
    Publication date: November 16, 2023
    Inventor: Martin LANGHAMMER
  • Publication number: 20230368030
    Abstract: Weights can be pruned during DNN training to increase sparsity in the weights and reduce the amount of computation required for performing the deep learning operations in DNNs. A DNN layer may have one or more weight tensors corresponding to one or more output channels of the layer. A weight tensor has weights, the values of which are determined by training the DNN. A weight tensor may have a dimension corresponding to the input channels of the layer. The weight tensor may be partitioned into subtensors, each of which has a subset of the input channels. The subtensor may have the same number of input channels. One or more subtensors may be selected, e.g., based on the weights in the one or more subtensors. The weights in a selected subtensor are pruned, e.g., changed to zeros. The weights in an unselected subtensor may be modified by further training the DNN.
    Type: Application
    Filed: July 25, 2023
    Publication date: November 16, 2023
    Inventors: Arnab Raha, Michael Wu, Deepak Abraham Mathaikutty, Martin Langhammer, Nihat Tunali
  • Patent number: 11809798
    Abstract: The present disclosure describes an integrated circuit device that includes a digital signal processing (DSP) block. The DSP block that includes a plurality of columns of weight registers and a plurality of inputs configured to receive a first plurality of values and a second plurality of values. The first plurality of values is stored in the plurality of columns of weight registers after being received. Also, the first plurality of inputs, the second plurality of inputs, or both are derived from higher precision values. Additionally, the DSP block includes a plurality of multipliers configured to simultaneously multiply each value of the first plurality of values by each value of the second plurality of values.
    Type: Grant
    Filed: June 26, 2020
    Date of Patent: November 7, 2023
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Simon Peter Finn
  • Publication number: 20230342111
    Abstract: An integrated circuit with specialized processing blocks is provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.
    Type: Application
    Filed: June 30, 2023
    Publication date: October 26, 2023
    Inventors: Martin Langhammer, Dongdong Chen, Kevin Hurd
  • Patent number: 11797473
    Abstract: An accelerated processor structure on a programmable integrated circuit device includes a processor and a plurality of configurable digital signal processors (DSPs). Each configurable DSP includes a circuit block, which in turn includes a plurality of multipliers. The accelerated processor structure further includes a first bus to transfer data from the processor to the configurable DSPs, and a second bus to transfer data from the configurable DSPs to the processor.
    Type: Grant
    Filed: October 8, 2018
    Date of Patent: October 24, 2023
    Assignee: Altera Corporation
    Inventors: David Shippy, Martin Langhammer, Jeffrey Eastlack
  • Publication number: 20230333857
    Abstract: A processor of an aspect includes an instruction unit to receive a single instruction, multiple thread (SIMT) instruction. The SIMT instruction has at least one field to provide at least one value. The at least one value is to indicate a plurality of threads that are to execute the SIMT instruction. The processor also includes a SIMT processor coupled with the instruction unit. The SIMT processor is to execute the SIMT instruction for each of the plurality of threads. Other processors, methods, systems, and machine-readable medium storing such a SIMT instructions are also disclosed.
    Type: Application
    Filed: June 22, 2023
    Publication date: October 19, 2023
    Inventor: Martin LANGHAMMER
  • Patent number: 11789641
    Abstract: A three dimensional circuit system includes a first integrated circuit die having a core logic region that has first memory circuits and logic circuits. The three dimensional circuit system includes a second integrated circuit die that has second memory circuits. The first and second integrated circuit dies are coupled together in a vertically stacked configuration. The three dimensional circuit system includes third memory circuits coupled to the first integrated circuit die. The third memory circuits reside in a plane of the first integrated circuit die. The logic circuits are coupled to access the first, second, and third memory circuits and data can move between the first, second, and third memories. The third memory circuits have a larger memory capacity and a smaller memory access bandwidth than the second memory circuits. The second memory circuits have a larger memory capacity and a smaller memory access bandwidth than the first memory circuits.
    Type: Grant
    Filed: June 16, 2021
    Date of Patent: October 17, 2023
    Assignee: Intel Corporation
    Inventors: Scott Weber, Jawad Khan, Ilya Ganusov, Martin Langhammer, Matthew Adiletta, Terence Magee, Albert Fazio, Richard Coulson, Ravi Gutala, Aravind Dasu, Mahesh Iyer
  • Publication number: 20230325665
    Abstract: Gate switching in deep learning operations can be reduced based on sparsity in the input data. A first element of an activation operand and a first element of a weight operand may be stored in input storage units associated with a multiplier in a processing element. The multiplier computes a product of the two elements, which may be stored in an output storage unit of the multiplier. After detecting that a second element of the activation operand or a second element of the weight operand is zero valued, gate switching is reduced by avoiding at least one gate switching needed for the multiply-accumulation operation. For instance, the input storage units may not be updated. A zero-valued data element may be stored in the output storage unit of the multiplier and used as a product of the second element of the activation operand and the second element of the weight operand.
    Type: Application
    Filed: May 30, 2023
    Publication date: October 12, 2023
    Applicant: Intel Corporation
    Inventors: Martin Langhammer, Arnab Raha, Martin Power
  • Publication number: 20230273770
    Abstract: Integrated circuit devices, methods, and circuitry for implementing and using an iterative multiplicative modular reduction circuit are provided. Such circuitry may include polynomial multiplication circuitry and modular reduction circuitry that may operate concurrently. The polynomial multiplication circuitry may multiply a first input value to a second input value to compute a product. The modular reduction circuitry may perform modular reduction on a first component of the product while the polynomial multiplication circuitry is still generating other components of the product.
    Type: Application
    Filed: March 16, 2023
    Publication date: August 31, 2023
    Inventors: Sergey Vladimirovich Gribok, Martin Langhammer, Bogdan Pasca
  • Patent number: 11726744
    Abstract: An integrated circuit with specialized processing blocks is provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.
    Type: Grant
    Filed: March 26, 2021
    Date of Patent: August 15, 2023
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Dongdong Chen, Kevin Hurd
  • Publication number: 20230239136
    Abstract: Integrated circuits, methods, and circuitry are provided for performing multiplication such as that used in Galois field counter mode (GCM) hash computations. An integrated circuit may include selection circuitry to provide one of several powers of a hash key. A Galois field multiplier may receive the one of the powers of the hash key and a hash sequence and generate one or more values. The Galois field multiplier may include multiple levels of pipeline stages. An adder may receive the one or more values and provide a summation of the one or more values in computing a GCM hash.
    Type: Application
    Filed: March 31, 2023
    Publication date: July 27, 2023
    Inventors: Sergey Vladimirovich Gribok, Gregg William Baeckler, Bogdan Pasca, Martin Langhammer
  • Publication number: 20230229917
    Abstract: A compute block can perform hybrid multiply-accumulate (MAC) operations. The compute block may include a weight compressing module and a processing element (PE) array. The weight compression module may select a first group of one or more weights and a second group of one or more weights from a weight tensor of a DNN (deep neural network) layer. A weight in the first group is quantized to a power of two value. A weight in the second group is quantized to an integer. The integer and the exponent of the power of two value may be stored in a memory in lieu of the original values of the weights. A PE in the PE array includes a shifter configured to shift an activation of the layer by the exponent of the power of two value and a multiplier configured to multiplying the integer with another activation of the layer.
    Type: Application
    Filed: March 15, 2023
    Publication date: July 20, 2023
    Applicant: Intel Corporation
    Inventors: Michael Wu, Arnab Raha, Deepak Abraham Mathaikutty, Nihat Tunali, Martin Langhammer
  • Publication number: 20230222275
    Abstract: A method is provided for processing code for a circuit design for an integrated circuit using a computer system. The method includes receiving at least a portion of the code for the circuit design for the integrated circuit, wherein the portion of the code comprises an error or has incomplete constraints, making an assumption about the error and the missing constraints using a computer aid design tool, and generating a revised circuit design for the integrated circuit with the error corrected and any missing constraints added based on the assumption and based on the code using the computer aided design tool and a library of components for circuit designs.
    Type: Application
    Filed: March 16, 2023
    Publication date: July 13, 2023
    Applicant: Intel Corporation
    Inventors: Gregg Baeckler, Mahesh A. Iyer, Martin Langhammer