Patents by Inventor Martin Langhammer

Martin Langhammer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250251910
    Abstract: Integrated circuit devices, methods, and circuitry for an efficient multiplier are provided. Multiplier circuitry to multiply a multiplicand value with a multiplier value may include, among other things, input circuitry and carry-based coding circuitry. The input circuitry may receive the multiplicand value and the multiplier value. The carry-based coding circuitry may receive bits of the multiplier value and generate multiplication codes using a carry-based coding scheme that includes multiplication codes according to a Booth's coding scheme but with at least one multiplication code that is removed and replaced with another at least one multiplication code with a different value. A first encoder of the carry-based coding circuitry may receive a carry signal to adjust a multiplication code value of the first encoder based on a second encoder of the carry-based coding circuitry encoding the multiplication code with the different value.
    Type: Application
    Filed: March 28, 2024
    Publication date: August 7, 2025
    Inventors: Igor Viktorovich Kucherenko, Bogdan Pasca, Martin Langhammer
  • Publication number: 20250238232
    Abstract: The present disclosure relates to an integrated circuit device that includes a plurality of vector registers configurable to store a plurality of vectors and switch circuitry communicatively coupled to the plurality of vector registers. The switch circuitry is configurable to route a portion of the plurality of vectors. Additionally, the integrated circuit device includes a plurality of vector processing units communicatively coupled to the switch circuitry. The plurality of vector processing units is configurable to receive the portion of the plurality of vectors, perform one or more operations involving the portion of the plurality of vector inputs, and output a second plurality of vectors generated by performing the one or more operations.
    Type: Application
    Filed: March 5, 2025
    Publication date: July 24, 2025
    Inventors: Martin Langhammer, Eriko Nurvitadhi, Gregg William Baeckler
  • Publication number: 20250217109
    Abstract: Integrated circuit devices, methods, and circuitry for an efficient multiplier are provided. Multiplier circuitry to multiply a multiplicand value with a multiplier value may include, among other things, decoding circuitry, tripler circuitry, and partial product multiplexing circuitry. The decoding circuitry may decode bits of the multiplier value using a decoding scheme that includes at least a coding that indicates a triple, the tripler circuitry may generate a triple of the multiplicand value and may include circuitry to generate the triple of the multiplicand value that sums at least two different vectors, and the partial product multiplexing circuitry may select the triple of the multiplicand as a partial product when the coding indicates the triple.
    Type: Application
    Filed: December 28, 2023
    Publication date: July 3, 2025
    Inventors: Igor Viktorovich Kucherenko, Martin Langhammer
  • Publication number: 20250208833
    Abstract: Techniques for handling block format floating point and/or integer numbers are described. In some examples, circuitry for handling block format floating point and/or integer numbers includes a plurality of multiplexers to select between the output of the mantissa multiplier circuits and outputs of the shift circuits to allow for support for block and non-block numbers.
    Type: Application
    Filed: December 30, 2023
    Publication date: June 26, 2025
    Inventors: Martin Langhammer, Alexander Heinecke
  • Publication number: 20250208865
    Abstract: Techniques for converting block format numbers are described. In some examples, a single instruction is used that is to at least include fields for an opcode, a first source operand, and a destination operand, wherein the opcode is to at least indicate execution circuitry is to perform a conversion of one or more block format numbers associated with the first source operand that each have a value of a scale multiplied by a value of a data element to a non-block format and store the non-block formatted numbers in the destination operand.
    Type: Application
    Filed: December 30, 2023
    Publication date: June 26, 2025
    Inventors: Alexander Heinecke, Martin Langhammer
  • Publication number: 20250208864
    Abstract: Techniques for dot products using block format numbers are described. In some examples, a single instruction including one or more fields for an identifier of at least a first source operand, one or more field for an identifier of a second source operand, and one or more fields for an identifier of a destination operand, and a field for an opcode, the opcode to at least indicate execution circuitry is to perform a dot product utilizing data that is in the block format to encode one or more numbers, wherein a block number of the block format has a value of a scale multiplied by a value of a scalar element and wherein the data that is in the block format is to use data from at least the first and second source operands is used for performing dot products.
    Type: Application
    Filed: December 30, 2023
    Publication date: June 26, 2025
    Inventors: Alexander Heinecke, Martin Langhammer
  • Patent number: 12340219
    Abstract: The present disclosure describes a digital signal processing (DSP) block that includes a plurality of columns of weight registers and a plurality of inputs configured to receive a first plurality of values and a second plurality of values. The first plurality of values is stored in the plurality of columns of weight registers after being received. Additionally, the DSP block includes a plurality of multipliers configured to simultaneously multiply each value of the first plurality of values by each value of the second plurality of values.
    Type: Grant
    Filed: February 7, 2024
    Date of Patent: June 24, 2025
    Assignee: Altera Corporation
    Inventors: Martin Langhammer, Dongdong Chen, Jason R. Bergendahl
  • Publication number: 20250199762
    Abstract: A programmable device may be configured to support machine learning training operations using matrix multiplication circuitry. In some embodiments, the multiplication is implemented on a systolic array. The systolic array includes an array of processing elements, each of which includes hybrid floating-point dot-product circuitry.
    Type: Application
    Filed: March 3, 2025
    Publication date: June 19, 2025
    Inventors: Martin Langhammer, Bogdan Pasca, Sergey Gribok, Gregg William Baeckler, Andrei Hagiescu
  • Publication number: 20250190523
    Abstract: SoftMax operation is one part of a deep neural network (DNN). Because computing SoftMax is complex and time-consuming, the SoftMax operation can limit the overall execution latency of the DNN. To address this issue, an in-line data path is added to pass output data from a matrix-to-matrix multiplication core to a hardware SoftMax accelerator. During a denominator phase of the SoftMax operation, the SoftMax accelerator can operate in-line to produce a denominator value using output values generated by the matrix-to-matrix multiplication core and received over the in-line data path. During a numerator phase of the SoftMax operation, the SoftMax accelerator can calculate SoftMax outputs using output values generated by the matrix-to-matrix multiplication core and retrieved from a memory. In other words, the SoftMax accelerator can produce partial results while the matrix-to-matrix multiplication is in-flight to cut down overall latency and reduce memory transactions.
    Type: Application
    Filed: February 18, 2025
    Publication date: June 12, 2025
    Applicant: Intel Corporation
    Inventors: Kamlesh Pillai, Bogdan Pasca, Martin Langhammer
  • Publication number: 20250123804
    Abstract: Integrated circuits with dot product circuitry are provided. The dot product circuitry may be configured to generate partial products of different ranks based on the inputs. The partial products may be organized into corresponding groups based on their ranks. Each group of partial products having the same rank can then be compressed using a compressor/reduction tree. At least some of the compressed partial product values may be shifted between the different groups to maintain the proper offset. Each partial product may have an associated one's to two's complement conversion bit. The conversion bits of the various partial product groups can be separately aggregated and then injected into the compressor tree at one or more locations.
    Type: Application
    Filed: December 26, 2024
    Publication date: April 17, 2025
    Inventor: Martin Langhammer
  • Patent number: 12254316
    Abstract: The present disclosure relates to an integrated circuit device that includes a plurality of vector registers configurable to store a plurality of vectors and switch circuitry communicatively coupled to the plurality of vector registers. The switch circuitry is configurable to route a portion of the plurality of vectors. Additionally, the integrated circuit device includes a plurality of vector processing units communicatively coupled to the switch circuitry. The plurality of vector processing units is configurable to receive the portion of the plurality of vectors, perform one or more operations involving the portion of the plurality of vector inputs, and output a second plurality of vectors generated by performing the one or more operations.
    Type: Grant
    Filed: March 26, 2021
    Date of Patent: March 18, 2025
    Assignee: Altera Corporation
    Inventors: Martin Langhammer, Eriko Nurvitadhi, Gregg William Baeckler
  • Publication number: 20250060940
    Abstract: A data processing unit may include a memory, processing elements (PEs), and a control unit. The memory may store weight blocks within a weight tensor of a neural network operation. Each weight block has an input channel (IC) dimension and an output channel (OC) dimension and includes subblocks. A subblock includes one or more weights having a first data precision and one or more other weights having a second data precision. The second data precision is lower than the first data precision. The control unit may distribute different ones of the subblocks to different ones of the PEs. A PE may receive a subblock and perform a first MAC operation on a weight having a first data precision and a second MAC operation on a weight having a second data precision. The first MAC operation may consume more computation cycles or more multipliers than the second MAC operation.
    Type: Application
    Filed: October 30, 2024
    Publication date: February 20, 2025
    Applicant: Intel Corporation
    Inventors: Arnab Raha, Michael Wu, Deepak Abraham Mathaikutty, Daksha Sharma, Martin Langhammer
  • Publication number: 20250045017
    Abstract: Integrated circuit devices and circuitry for implementing and using efficient circuitry for summation of tensors having shared exponents and conversion into a floating-point format rae provided. Such circuitry may include first input circuitry to receive a first tensor in a fixed-point format having a first shared exponent and second input circuitry to receive a second tensor in the fixed-point format with a second shared exponent. Addition circuitry may add the first tensor and the second tensor, without first converting the first tensor and the second tensor to a floating-point format, to obtain a result in the floating-point format.
    Type: Application
    Filed: September 27, 2024
    Publication date: February 6, 2025
    Inventors: Martin Langhammer, Bogdan Pasca, Dongdong Chen, Ilya Ganusov
  • Publication number: 20250021305
    Abstract: Integrated circuit devices, methods, and circuitry for implementing filters based on multipliers in tensor circuits are provided. Integrated circuitry may include a first tensor circuit with a first set of multipliers of a first precision and first summation circuitry and a second tensor circuit with a second set of multipliers of a second precision and second summation circuitry. The first tensor circuit and the second tensor circuit may collectively perform a multiplication operation at a third precision higher than the first precision and the second precision.
    Type: Application
    Filed: September 27, 2024
    Publication date: January 16, 2025
    Inventors: Martin Langhammer, Volker Mauer, Gregory Ives, Dongdong Chen, Bogdan Pasca
  • Patent number: 12197888
    Abstract: Integrated circuits with dot product circuitry are provided. The dot product circuitry may be configured to generate partial products of different ranks based on the inputs. The partial products may be organized into corresponding groups based on their ranks. Each group of partial products having the same rank can then be compressed using a compressor/reduction tree. At least some of the compressed partial product values may be shifted between the different groups to maintain the proper offset. Each partial product may have an associated one's to two's complement conversion bit. The conversion bits of the various partial product groups can be separately aggregated and then injected into the compressor tree at one or more locations.
    Type: Grant
    Filed: December 23, 2019
    Date of Patent: January 14, 2025
    Assignee: Altera Corporation
    Inventor: Martin Langhammer
  • Publication number: 20250013431
    Abstract: Integrated circuit devices, methods, and circuitry for implementing and using a hybrid modular multiplier circuit using a number of different modular reduction techniques are provided. Integrated circuitry may include multiplication circuitry to multiply an input multiplicand value with an input multiplier value to obtain a product, first coarse-grain modular reduction circuitry to partially reduce the product based on a modulus value using a first type of modular reduction, second coarse-grain modular reduction circuitry to further reduce the product based on the modulus value using a second type of modular reduction, and fine-grain modular reduction circuitry to finally reduce the product based on the modulus value using a third type of modular reduction to produce a final modular reduction result.
    Type: Application
    Filed: September 26, 2024
    Publication date: January 9, 2025
    Inventors: Sergey Vladimirovich Gribok, Martin Langhammer, Bogdan Pasca
  • Patent number: 12182534
    Abstract: A digital signal processing (DSP) block includes a plurality of multipliers and a summation block separate from the plurality of multipliers. The DSP block is configurable to perform a first multiplication operation to determine a first product of a first floating-point value and a second floating-point value using only a first multiplier of the plurality of multipliers. Additionally, the DSP block is configurable to perform a second multiplication operation between a third floating-point value and a fourth floating-point value by receiving, at each of the plurality of multipliers, two integer values generated from the third floating-point value and the fourth floating-point value, generating, via the plurality of multipliers, a plurality of subproducts by multiplying, at each of the multipliers, the two integer values, and generating a second product of the second multiplication operation by adding, via the summation block, the plurality of subproducts.
    Type: Grant
    Filed: June 25, 2021
    Date of Patent: December 31, 2024
    Assignee: Intel Corporation
    Inventor: Martin Langhammer
  • Publication number: 20240394448
    Abstract: A method for implementing a programmable device is provided. The method may include extracting an underlay from an existing routing network on the programmable device and then mapping a user design to the extracted underlay. The underlay may represent a subset of fast routing wires satisfying predetermined constraints. The underlay may be composed of multiple repeating adjacent logic blocks, each implementing some datapath reduction operation. Implementing circuit designs in this way can dramatically improve circuit performance while cutting down compile times by more than half.
    Type: Application
    Filed: August 6, 2024
    Publication date: November 28, 2024
    Inventors: Gregg William Baeckler, Martin Langhammer
  • Patent number: 12135955
    Abstract: An integrated circuit device includes multiplier circuitry configured to determine a plurality of columns of subproducts by multiplying a plurality of values. Each column of the plurality of columns includes one or more subproducts of a plurality of subproducts. The integrated circuit device also includes adder circuitry configured to determine a plurality of sums, each sum being a sum of one column of the plurality of columns. A first portion of the adder circuitry associated with a first column of the plurality of columns is configured to receive a first value and second value that are associated with the first column and a third value associated with a second column of the plurality of columns that differs from the first column. The third value is a carry-out value generated by a second portion of the adder circuitry associated with the second column of the plurality of columns.
    Type: Grant
    Filed: December 24, 2020
    Date of Patent: November 5, 2024
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Bogdan Mihai Pasca
  • Publication number: 20240345804
    Abstract: The present disclosure relates generally to techniques for adjusting the number representation (e.g., format) of a variable before and/or after performing one or more arithmetic operations on the variable. In particular, the present disclosure relates to scaling the range of a variable to a suitable representation based on available hardware (e.g., hard logic) in an integrated circuit device. For example, an input in a first number format (e.g., bfloat16) may be scaled to a second number format (e.g., half-precision floating-point) so that circuitry implemented to receive inputs in the second number format may perform one or more arithmetic operations on the input. Further, the output produced by the circuitry may be scaled back to the first number format. Accordingly, arithmetic operations, such as a dot-product, performed in a first format may be emulated by scaling the inputs to and/or the outputs from arithmetic operations performed in another format.
    Type: Application
    Filed: June 26, 2024
    Publication date: October 17, 2024
    Inventors: Bogdan Mihai Pasca, Martin Langhammer