Patents by Inventor Martin Langhammer

Martin Langhammer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250094378
    Abstract: A single instruction, multiple thread (SIMT) processor of an aspect includes a register file having a plurality of sets of registers. Each of the plurality of sets of registers corresponds to a different thread of a parallel thread group. The SIMT processor also includes a storage coupled with the register file. The storage has a plurality of sets of one or more data element storage locations. Each of the plurality of sets of one or more data element storage locations corresponds to a different thread of the parallel thread group. Each of the sets of one or more data element storage locations is to store a copy of one or more data elements from only a subset of the set of registers for the corresponding thread. Other SIMT processors, methods, and systems are also disclosed.
    Type: Application
    Filed: December 2, 2024
    Publication date: March 20, 2025
    Inventor: Martin LANGHAMMER
  • Patent number: 12254316
    Abstract: The present disclosure relates to an integrated circuit device that includes a plurality of vector registers configurable to store a plurality of vectors and switch circuitry communicatively coupled to the plurality of vector registers. The switch circuitry is configurable to route a portion of the plurality of vectors. Additionally, the integrated circuit device includes a plurality of vector processing units communicatively coupled to the switch circuitry. The plurality of vector processing units is configurable to receive the portion of the plurality of vectors, perform one or more operations involving the portion of the plurality of vector inputs, and output a second plurality of vectors generated by performing the one or more operations.
    Type: Grant
    Filed: March 26, 2021
    Date of Patent: March 18, 2025
    Assignee: Altera Corporation
    Inventors: Martin Langhammer, Eriko Nurvitadhi, Gregg William Baeckler
  • Publication number: 20250060940
    Abstract: A data processing unit may include a memory, processing elements (PEs), and a control unit. The memory may store weight blocks within a weight tensor of a neural network operation. Each weight block has an input channel (IC) dimension and an output channel (OC) dimension and includes subblocks. A subblock includes one or more weights having a first data precision and one or more other weights having a second data precision. The second data precision is lower than the first data precision. The control unit may distribute different ones of the subblocks to different ones of the PEs. A PE may receive a subblock and perform a first MAC operation on a weight having a first data precision and a second MAC operation on a weight having a second data precision. The first MAC operation may consume more computation cycles or more multipliers than the second MAC operation.
    Type: Application
    Filed: October 30, 2024
    Publication date: February 20, 2025
    Applicant: Intel Corporation
    Inventors: Arnab Raha, Michael Wu, Deepak Abraham Mathaikutty, Daksha Sharma, Martin Langhammer
  • Publication number: 20250045017
    Abstract: Integrated circuit devices and circuitry for implementing and using efficient circuitry for summation of tensors having shared exponents and conversion into a floating-point format rae provided. Such circuitry may include first input circuitry to receive a first tensor in a fixed-point format having a first shared exponent and second input circuitry to receive a second tensor in the fixed-point format with a second shared exponent. Addition circuitry may add the first tensor and the second tensor, without first converting the first tensor and the second tensor to a floating-point format, to obtain a result in the floating-point format.
    Type: Application
    Filed: September 27, 2024
    Publication date: February 6, 2025
    Inventors: Martin Langhammer, Bogdan Pasca, Dongdong Chen, Ilya Ganusov
  • Publication number: 20250021305
    Abstract: Integrated circuit devices, methods, and circuitry for implementing filters based on multipliers in tensor circuits are provided. Integrated circuitry may include a first tensor circuit with a first set of multipliers of a first precision and first summation circuitry and a second tensor circuit with a second set of multipliers of a second precision and second summation circuitry. The first tensor circuit and the second tensor circuit may collectively perform a multiplication operation at a third precision higher than the first precision and the second precision.
    Type: Application
    Filed: September 27, 2024
    Publication date: January 16, 2025
    Inventors: Martin Langhammer, Volker Mauer, Gregory Ives, Dongdong Chen, Bogdan Pasca
  • Patent number: 12197888
    Abstract: Integrated circuits with dot product circuitry are provided. The dot product circuitry may be configured to generate partial products of different ranks based on the inputs. The partial products may be organized into corresponding groups based on their ranks. Each group of partial products having the same rank can then be compressed using a compressor/reduction tree. At least some of the compressed partial product values may be shifted between the different groups to maintain the proper offset. Each partial product may have an associated one's to two's complement conversion bit. The conversion bits of the various partial product groups can be separately aggregated and then injected into the compressor tree at one or more locations.
    Type: Grant
    Filed: December 23, 2019
    Date of Patent: January 14, 2025
    Assignee: Altera Corporation
    Inventor: Martin Langhammer
  • Publication number: 20250013431
    Abstract: Integrated circuit devices, methods, and circuitry for implementing and using a hybrid modular multiplier circuit using a number of different modular reduction techniques are provided. Integrated circuitry may include multiplication circuitry to multiply an input multiplicand value with an input multiplier value to obtain a product, first coarse-grain modular reduction circuitry to partially reduce the product based on a modulus value using a first type of modular reduction, second coarse-grain modular reduction circuitry to further reduce the product based on the modulus value using a second type of modular reduction, and fine-grain modular reduction circuitry to finally reduce the product based on the modulus value using a third type of modular reduction to produce a final modular reduction result.
    Type: Application
    Filed: September 26, 2024
    Publication date: January 9, 2025
    Inventors: Sergey Vladimirovich Gribok, Martin Langhammer, Bogdan Pasca
  • Patent number: 12182534
    Abstract: A digital signal processing (DSP) block includes a plurality of multipliers and a summation block separate from the plurality of multipliers. The DSP block is configurable to perform a first multiplication operation to determine a first product of a first floating-point value and a second floating-point value using only a first multiplier of the plurality of multipliers. Additionally, the DSP block is configurable to perform a second multiplication operation between a third floating-point value and a fourth floating-point value by receiving, at each of the plurality of multipliers, two integer values generated from the third floating-point value and the fourth floating-point value, generating, via the plurality of multipliers, a plurality of subproducts by multiplying, at each of the multipliers, the two integer values, and generating a second product of the second multiplication operation by adding, via the summation block, the plurality of subproducts.
    Type: Grant
    Filed: June 25, 2021
    Date of Patent: December 31, 2024
    Assignee: Intel Corporation
    Inventor: Martin Langhammer
  • Publication number: 20240419444
    Abstract: A processor of an aspect includes decoder circuitry to decode an instruction indicating a source floating-point operand, having a floating-point data element, and indicating a destination register. The element has a sign bit, an N-bit first exponent value, and M bits. Execution circuitry of the processor is to interpret the M bits as an M-bit significand, when the N-bit first exponent value is not all zeroes or all ones, and interpret the M bits as including a second exponent value in at least one of the M bits, and a less than M-bit significand in at least one other of the M bits, when the N-bit first exponent value is either all zeroes or all ones. The execution unit is to perform an operation on the source floating-point operand to generate a result floating-point operand, and to store the result floating-point operand in the destination register.
    Type: Application
    Filed: June 15, 2023
    Publication date: December 19, 2024
    Inventors: Martin LANGHAMMER, Alexander F. HEINECKE
  • Publication number: 20240394448
    Abstract: A method for implementing a programmable device is provided. The method may include extracting an underlay from an existing routing network on the programmable device and then mapping a user design to the extracted underlay. The underlay may represent a subset of fast routing wires satisfying predetermined constraints. The underlay may be composed of multiple repeating adjacent logic blocks, each implementing some datapath reduction operation. Implementing circuit designs in this way can dramatically improve circuit performance while cutting down compile times by more than half.
    Type: Application
    Filed: August 6, 2024
    Publication date: November 28, 2024
    Inventors: Gregg William Baeckler, Martin Langhammer
  • Patent number: 12135955
    Abstract: An integrated circuit device includes multiplier circuitry configured to determine a plurality of columns of subproducts by multiplying a plurality of values. Each column of the plurality of columns includes one or more subproducts of a plurality of subproducts. The integrated circuit device also includes adder circuitry configured to determine a plurality of sums, each sum being a sum of one column of the plurality of columns. A first portion of the adder circuitry associated with a first column of the plurality of columns is configured to receive a first value and second value that are associated with the first column and a third value associated with a second column of the plurality of columns that differs from the first column. The third value is a carry-out value generated by a second portion of the adder circuitry associated with the second column of the plurality of columns.
    Type: Grant
    Filed: December 24, 2020
    Date of Patent: November 5, 2024
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Bogdan Mihai Pasca
  • Publication number: 20240345804
    Abstract: The present disclosure relates generally to techniques for adjusting the number representation (e.g., format) of a variable before and/or after performing one or more arithmetic operations on the variable. In particular, the present disclosure relates to scaling the range of a variable to a suitable representation based on available hardware (e.g., hard logic) in an integrated circuit device. For example, an input in a first number format (e.g., bfloat16) may be scaled to a second number format (e.g., half-precision floating-point) so that circuitry implemented to receive inputs in the second number format may perform one or more arithmetic operations on the input. Further, the output produced by the circuitry may be scaled back to the first number format. Accordingly, arithmetic operations, such as a dot-product, performed in a first format may be emulated by scaling the inputs to and/or the outputs from arithmetic operations performed in another format.
    Type: Application
    Filed: June 26, 2024
    Publication date: October 17, 2024
    Inventors: Bogdan Mihai Pasca, Martin Langhammer
  • Publication number: 20240329991
    Abstract: An apparatus of an aspect includes decoder circuitry to decode an instruction. The instruction to indicate at least one source floating-point vector, a destination storage location, and at least one value. The source floating-point vector is to have floating-point data elements. The at least one value is to indicate at least one of: (a) a number of significand bits of the floating-point data elements; (b) a number of exponent bits of the floating-point data elements; (c) exponent bias information for the floating-point data elements; or (d) any combination thereof. Execution circuitry coupled with decoder circuitry is to perform operations according to the instruction. The operations include to interpret the floating-point data elements consistent with the at least one value, perform an operation specified by the instruction on the at least one source floating-point vector to generate a result vector, and store the result vector in the destination storage location.
    Type: Application
    Filed: March 31, 2023
    Publication date: October 3, 2024
    Inventors: Martin LANGHAMMER, Alexander F. HEINECKE
  • Patent number: 12086518
    Abstract: A method for implementing a programmable device is provided. The method may include extracting an underlay from an existing routing network on the programmable device and then mapping a user design to the extracted underlay. The underlay may represent a subset of fast routing wires satisfying predetermined constraints. The underlay may be composed of multiple repeating adjacent logic blocks, each implementing some datapath reduction operation. Implementing circuit designs in this way can dramatically improve circuit performance while cutting down compile times by more than half.
    Type: Grant
    Filed: June 1, 2020
    Date of Patent: September 10, 2024
    Assignee: Intel Corporation
    Inventors: Gregg William Baeckler, Martin Langhammer
  • Patent number: 12079590
    Abstract: Systems and methods related to performing arithmetic operations on floating-point numbers. Floating-point arithmetic circuitry is configured to receive two floating-point numbers. The floating-point arithmetic circuitry includes a first path configured to perform a first operation on the two floating-point numbers based at least in part on a difference in size between the two floating-point numbers. The floating-point arithmetic circuitry includes a second path configured to perform a second operation on the two floating-point numbers based at least in part on the difference is size between the two floating-point numbers. The first path and the second path diverge from each other after receipt of the floating-point numbers in the floating-point arithmetic circuitry and converge on a shared adder that is used for the first operation and the second operation.
    Type: Grant
    Filed: December 24, 2020
    Date of Patent: September 3, 2024
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Theo Drane
  • Publication number: 20240289168
    Abstract: Systems, apparatuses and methods may provide for technology that identifies a type of a first activation function, identifies a derivative level of the first activation function, and generates a first instruction based on the type of the first activation function and the derivative level of the first activation function. The technology also includes an accelerator having logic coupled to one or more substrates, the logic including a compute engine including a plurality of arithmetic operators, a multiplexer network coupled to the compute engine, and a controller coupled to the multiplexer network, the controller to detect the first instruction, decode the first instruction to identify the first activation function, and drive the multiplexer network to form first connections between two or more of the plurality of arithmetic operators in accordance with the first activation function, wherein the first connections are to cause the compute engine to conduct the first activation function.
    Type: Application
    Filed: November 10, 2023
    Publication date: August 29, 2024
    Inventors: Krishnan Ananthanarayanan, Martin Langhammer, Om Ji Omer, Bogdan Pasca, Kamlesh Pillai, Pramod Udupa
  • Patent number: 12056461
    Abstract: An integrated circuit with specialized processing blocks are provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.
    Type: Grant
    Filed: September 24, 2021
    Date of Patent: August 6, 2024
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Dongdong Chen
  • Patent number: 12045581
    Abstract: The present disclosure relates generally to techniques for adjusting the number representation (e.g., format) of a variable before and/or after performing one or more arithmetic operations on the variable. In particular, the present disclosure relates to scaling the range of a variable to a suitable representation based on available hardware (e.g., hard logic) in an integrated circuit device. For example, an input in a first number format (e.g., bfloat16) may be scaled to a second number format (e.g., half-precision floating-point) so that circuitry implemented to receive inputs in the second number format may perform one or more arithmetic operations on the input. Further, the output produced by the circuitry may be scaled back to the first number format. Accordingly, arithmetic operations, such as a dot-product, performed in a first format may be emulated by scaling the inputs to and/or the outputs from arithmetic operations performed in another format.
    Type: Grant
    Filed: April 1, 2022
    Date of Patent: July 23, 2024
    Assignee: Intel Corporation
    Inventors: Bogdan Mihai Pasca, Martin Langhammer
  • Patent number: 12020000
    Abstract: Systems and methods include arithmetic circuitry that generates a floating-point mantissa and includes a propagation network that calculates the floating-point mantissa based on input bits. The systems and methods also include rounding circuitry that rounds the floating-point mantissa. The rounding circuitry includes a multiplexer at a rounding location for the floating-point mantissa that selectively inputs a first input bit of the input bits or a rounding bit. The rounding circuitry also includes an OR gate that ORs a second input bit of the input bits with the rounding bit. Moreover, the second input bit is a less significant bit than the first input bit.
    Type: Grant
    Filed: December 24, 2020
    Date of Patent: June 25, 2024
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Alexander Heinecke
  • Publication number: 20240176619
    Abstract: The present disclosure describes a digital signal processing (DSP) block that includes a plurality of columns of weight registers and a plurality of inputs configured to receive a first plurality of values and a second plurality of values. The first plurality of values is stored in the plurality of columns of weight registers after being received. Additionally, the DSP block includes a plurality of multipliers configured to simultaneously multiply each value of the first plurality of values by each value of the second plurality of values.
    Type: Application
    Filed: February 7, 2024
    Publication date: May 30, 2024
    Inventors: Martin Langhammer, Dongdong Chen, Jason R. Bergendahl