Patents by Inventor Martin Langhammer

Martin Langhammer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Multiplier Circuit with Carry-Based Partial Product Encoding

Publication number: 20250251910

Abstract: Integrated circuit devices, methods, and circuitry for an efficient multiplier are provided. Multiplier circuitry to multiply a multiplicand value with a multiplier value may include, among other things, input circuitry and carry-based coding circuitry. The input circuitry may receive the multiplicand value and the multiplier value. The carry-based coding circuitry may receive bits of the multiplier value and generate multiplication codes using a carry-based coding scheme that includes multiplication codes according to a Booth's coding scheme but with at least one multiplication code that is removed and replaced with another at least one multiplication code with a different value. A first encoder of the carry-based coding circuitry may receive a carry signal to adjust a multiplication code value of the first encoder based on a second encoder of the carry-based coding circuitry encoding the multiplication code with the different value.

Type: Application

Filed: March 28, 2024

Publication date: August 7, 2025

Inventors: Igor Viktorovich Kucherenko, Bogdan Pasca, Martin Langhammer
Vector Processor Architectures

Publication number: 20250238232

Abstract: The present disclosure relates to an integrated circuit device that includes a plurality of vector registers configurable to store a plurality of vectors and switch circuitry communicatively coupled to the plurality of vector registers. The switch circuitry is configurable to route a portion of the plurality of vectors. Additionally, the integrated circuit device includes a plurality of vector processing units communicatively coupled to the switch circuitry. The plurality of vector processing units is configurable to receive the portion of the plurality of vectors, perform one or more operations involving the portion of the plurality of vector inputs, and output a second plurality of vectors generated by performing the one or more operations.

Type: Application

Filed: March 5, 2025

Publication date: July 24, 2025

Inventors: Martin Langhammer, Eriko Nurvitadhi, Gregg William Baeckler
Multiplier with a new Partial Product Generation Method

Publication number: 20250217109

Abstract: Integrated circuit devices, methods, and circuitry for an efficient multiplier are provided. Multiplier circuitry to multiply a multiplicand value with a multiplier value may include, among other things, decoding circuitry, tripler circuitry, and partial product multiplexing circuitry. The decoding circuitry may decode bits of the multiplier value using a decoding scheme that includes at least a coding that indicates a triple, the tripler circuitry may generate a triple of the multiplicand value and may include circuitry to generate the triple of the multiplicand value that sums at least two different vectors, and the partial product multiplexing circuitry may select the triple of the multiplicand as a partial product when the coding indicates the triple.

Type: Application

Filed: December 28, 2023

Publication date: July 3, 2025

Inventors: Igor Viktorovich Kucherenko, Martin Langhammer
BLOCK FLOATING POINT SUPPORT

Publication number: 20250208833

Abstract: Techniques for handling block format floating point and/or integer numbers are described. In some examples, circuitry for handling block format floating point and/or integer numbers includes a plurality of multiplexers to select between the output of the mantissa multiplier circuits and outputs of the shift circuits to allow for support for block and non-block numbers.

Type: Application

Filed: December 30, 2023

Publication date: June 26, 2025

Inventors: Martin Langhammer, Alexander Heinecke
BLOCK NUMBER CONVERSION

Publication number: 20250208865

Abstract: Techniques for converting block format numbers are described. In some examples, a single instruction is used that is to at least include fields for an opcode, a first source operand, and a destination operand, wherein the opcode is to at least indicate execution circuitry is to perform a conversion of one or more block format numbers associated with the first source operand that each have a value of a scale multiplied by a value of a data element to a non-block format and store the non-block formatted numbers in the destination operand.

Type: Application

Filed: December 30, 2023

Publication date: June 26, 2025

Inventors: Alexander Heinecke, Martin Langhammer
BLOCK NUMBER DOT PRODUCT

Publication number: 20250208864

Abstract: Techniques for dot products using block format numbers are described. In some examples, a single instruction including one or more fields for an identifier of at least a first source operand, one or more field for an identifier of a second source operand, and one or more fields for an identifier of a destination operand, and a field for an opcode, the opcode to at least indicate execution circuitry is to perform a dot product utilizing data that is in the block format to encode one or more numbers, wherein a block number of the block format has a value of a scale multiplied by a value of a scalar element and wherein the data that is in the block format is to use data from at least the first and second source operands is used for performing dot products.

Type: Application

Filed: December 30, 2023

Publication date: June 26, 2025

Inventors: Alexander Heinecke, Martin Langhammer
FPGA specialist processing block for machine learning

Patent number: 12340219

Abstract: The present disclosure describes a digital signal processing (DSP) block that includes a plurality of columns of weight registers and a plurality of inputs configured to receive a first plurality of values and a second plurality of values. The first plurality of values is stored in the plurality of columns of weight registers after being received. Additionally, the DSP block includes a plurality of multipliers configured to simultaneously multiply each value of the first plurality of values by each value of the second plurality of values.

Type: Grant

Filed: February 7, 2024

Date of Patent: June 24, 2025

Assignee: Altera Corporation

Inventors: Martin Langhammer, Dongdong Chen, Jason R. Bergendahl
MACHINE LEARNING TRAINING ARCHITECTURE FOR PROGRAMMABLE DEVICES

Publication number: 20250199762

Abstract: A programmable device may be configured to support machine learning training operations using matrix multiplication circuitry. In some embodiments, the multiplication is implemented on a systolic array. The systolic array includes an array of processing elements, each of which includes hybrid floating-point dot-product circuitry.

Type: Application

Filed: March 3, 2025

Publication date: June 19, 2025

Inventors: Martin Langhammer, Bogdan Pasca, Sergey Gribok, Gregg William Baeckler, Andrei Hagiescu
LUT-FREE HARDWARE BASED SOFTMAX ACCELERATOR

Publication number: 20250190523

Abstract: SoftMax operation is one part of a deep neural network (DNN). Because computing SoftMax is complex and time-consuming, the SoftMax operation can limit the overall execution latency of the DNN. To address this issue, an in-line data path is added to pass output data from a matrix-to-matrix multiplication core to a hardware SoftMax accelerator. During a denominator phase of the SoftMax operation, the SoftMax accelerator can operate in-line to produce a denominator value using output values generated by the matrix-to-matrix multiplication core and received over the in-line data path. During a numerator phase of the SoftMax operation, the SoftMax accelerator can calculate SoftMax outputs using output values generated by the matrix-to-matrix multiplication core and retrieved from a memory. In other words, the SoftMax accelerator can produce partial results while the matrix-to-matrix multiplication is in-flight to cut down overall latency and reduce memory transactions.

Type: Application

Filed: February 18, 2025

Publication date: June 12, 2025

Applicant: Intel Corporation

Inventors: Kamlesh Pillai, Bogdan Pasca, Martin Langhammer
RANK-BASED DOT PRODUCT CIRCUITRY

Publication number: 20250123804

Abstract: Integrated circuits with dot product circuitry are provided. The dot product circuitry may be configured to generate partial products of different ranks based on the inputs. The partial products may be organized into corresponding groups based on their ranks. Each group of partial products having the same rank can then be compressed using a compressor/reduction tree. At least some of the compressed partial product values may be shifted between the different groups to maintain the proper offset. Each partial product may have an associated one's to two's complement conversion bit. The conversion bits of the various partial product groups can be separately aggregated and then injected into the compressor tree at one or more locations.

Type: Application

Filed: December 26, 2024

Publication date: April 17, 2025

Inventor: Martin Langhammer
Vector processor architectures

Patent number: 12254316

Abstract: The present disclosure relates to an integrated circuit device that includes a plurality of vector registers configurable to store a plurality of vectors and switch circuitry communicatively coupled to the plurality of vector registers. The switch circuitry is configurable to route a portion of the plurality of vectors. Additionally, the integrated circuit device includes a plurality of vector processing units communicatively coupled to the switch circuitry. The plurality of vector processing units is configurable to receive the portion of the plurality of vectors, perform one or more operations involving the portion of the plurality of vector inputs, and output a second plurality of vectors generated by performing the one or more operations.

Type: Grant

Filed: March 26, 2021

Date of Patent: March 18, 2025

Assignee: Altera Corporation

Inventors: Martin Langhammer, Eriko Nurvitadhi, Gregg William Baeckler
NEURAL NETWORK ACCELERATOR PERFORMING OPERATION WITH MIXED-FORMAT WEIGHTS

Publication number: 20250060940

Abstract: A data processing unit may include a memory, processing elements (PEs), and a control unit. The memory may store weight blocks within a weight tensor of a neural network operation. Each weight block has an input channel (IC) dimension and an output channel (OC) dimension and includes subblocks. A subblock includes one or more weights having a first data precision and one or more other weights having a second data precision. The second data precision is lower than the first data precision. The control unit may distribute different ones of the subblocks to different ones of the PEs. A PE may receive a subblock and perform a first MAC operation on a weight having a first data precision and a second MAC operation on a weight having a second data precision. The first MAC operation may consume more computation cycles or more multipliers than the second MAC operation.

Type: Application

Filed: October 30, 2024

Publication date: February 20, 2025

Applicant: Intel Corporation

Inventors: Arnab Raha, Michael Wu, Deepak Abraham Mathaikutty, Daksha Sharma, Martin Langhammer
SUMMATION AND FLOATING POINT CONVERSION OF TENSOR RESULTS

Publication number: 20250045017

Abstract: Integrated circuit devices and circuitry for implementing and using efficient circuitry for summation of tensors having shared exponents and conversion into a floating-point format rae provided. Such circuitry may include first input circuitry to receive a first tensor in a fixed-point format having a first shared exponent and second input circuitry to receive a second tensor in the fixed-point format with a second shared exponent. Addition circuitry may add the first tensor and the second tensor, without first converting the first tensor and the second tensor to a floating-point format, to obtain a result in the floating-point format.

Type: Application

Filed: September 27, 2024

Publication date: February 6, 2025

Inventors: Martin Langhammer, Bogdan Pasca, Dongdong Chen, Ilya Ganusov
Filtering with Tensor Structures

Publication number: 20250021305

Abstract: Integrated circuit devices, methods, and circuitry for implementing filters based on multipliers in tensor circuits are provided. Integrated circuitry may include a first tensor circuit with a first set of multipliers of a first precision and first summation circuitry and a second tensor circuit with a second set of multipliers of a second precision and second summation circuitry. The first tensor circuit and the second tensor circuit may collectively perform a multiplication operation at a third precision higher than the first precision and the second precision.

Type: Application

Filed: September 27, 2024

Publication date: January 16, 2025

Inventors: Martin Langhammer, Volker Mauer, Gregory Ives, Dongdong Chen, Bogdan Pasca
Rank-based dot product circuitry

Patent number: 12197888

Abstract: Integrated circuits with dot product circuitry are provided. The dot product circuitry may be configured to generate partial products of different ranks based on the inputs. The partial products may be organized into corresponding groups based on their ranks. Each group of partial products having the same rank can then be compressed using a compressor/reduction tree. At least some of the compressed partial product values may be shifted between the different groups to maintain the proper offset. Each partial product may have an associated one's to two's complement conversion bit. The conversion bits of the various partial product groups can be separately aggregated and then injected into the compressor tree at one or more locations.

Type: Grant

Filed: December 23, 2019

Date of Patent: January 14, 2025

Assignee: Altera Corporation

Inventor: Martin Langhammer
Modular Multipliers using Hybrid Reduction Techniques

Publication number: 20250013431

Abstract: Integrated circuit devices, methods, and circuitry for implementing and using a hybrid modular multiplier circuit using a number of different modular reduction techniques are provided. Integrated circuitry may include multiplication circuitry to multiply an input multiplicand value with an input multiplier value to obtain a product, first coarse-grain modular reduction circuitry to partially reduce the product based on a modulus value using a first type of modular reduction, second coarse-grain modular reduction circuitry to further reduce the product based on the modulus value using a second type of modular reduction, and fine-grain modular reduction circuitry to finally reduce the product based on the modulus value using a third type of modular reduction to produce a final modular reduction result.

Type: Application

Filed: September 26, 2024

Publication date: January 9, 2025

Inventors: Sergey Vladimirovich Gribok, Martin Langhammer, Bogdan Pasca
High precision decomposable DSP entity

Patent number: 12182534

Abstract: A digital signal processing (DSP) block includes a plurality of multipliers and a summation block separate from the plurality of multipliers. The DSP block is configurable to perform a first multiplication operation to determine a first product of a first floating-point value and a second floating-point value using only a first multiplier of the plurality of multipliers. Additionally, the DSP block is configurable to perform a second multiplication operation between a third floating-point value and a fourth floating-point value by receiving, at each of the plurality of multipliers, two integer values generated from the third floating-point value and the fourth floating-point value, generating, via the plurality of multipliers, a plurality of subproducts by multiplying, at each of the multipliers, the two integer values, and generating a second product of the second multiplication operation by adding, via the summation block, the plurality of subproducts.

Type: Grant

Filed: June 25, 2021

Date of Patent: December 31, 2024

Assignee: Intel Corporation

Inventor: Martin Langhammer
PROGRAMMABLE INTEGRATED CIRCUIT UNDERLAY

Publication number: 20240394448

Abstract: A method for implementing a programmable device is provided. The method may include extracting an underlay from an existing routing network on the programmable device and then mapping a user design to the extracted underlay. The underlay may represent a subset of fast routing wires satisfying predetermined constraints. The underlay may be composed of multiple repeating adjacent logic blocks, each implementing some datapath reduction operation. Implementing circuit designs in this way can dramatically improve circuit performance while cutting down compile times by more than half.

Type: Application

Filed: August 6, 2024

Publication date: November 28, 2024

Inventors: Gregg William Baeckler, Martin Langhammer
Systems and methods for low latency modular multiplication

Patent number: 12135955

Abstract: An integrated circuit device includes multiplier circuitry configured to determine a plurality of columns of subproducts by multiplying a plurality of values. Each column of the plurality of columns includes one or more subproducts of a plurality of subproducts. The integrated circuit device also includes adder circuitry configured to determine a plurality of sums, each sum being a sum of one column of the plurality of columns. A first portion of the adder circuitry associated with a first column of the plurality of columns is configured to receive a first value and second value that are associated with the first column and a third value associated with a second column of the plurality of columns that differs from the first column. The third value is a carry-out value generated by a second portion of the adder circuitry associated with the second column of the plurality of columns.

Type: Grant

Filed: December 24, 2020

Date of Patent: November 5, 2024

Assignee: Intel Corporation

Inventors: Martin Langhammer, Bogdan Mihai Pasca
Floating-Point Dynamic Range Expansion

Publication number: 20240345804

Abstract: The present disclosure relates generally to techniques for adjusting the number representation (e.g., format) of a variable before and/or after performing one or more arithmetic operations on the variable. In particular, the present disclosure relates to scaling the range of a variable to a suitable representation based on available hardware (e.g., hard logic) in an integrated circuit device. For example, an input in a first number format (e.g., bfloat16) may be scaled to a second number format (e.g., half-precision floating-point) so that circuitry implemented to receive inputs in the second number format may perform one or more arithmetic operations on the input. Further, the output produced by the circuitry may be scaled back to the first number format. Accordingly, arithmetic operations, such as a dot-product, performed in a first format may be emulated by scaling the inputs to and/or the outputs from arithmetic operations performed in another format.

Type: Application

Filed: June 26, 2024

Publication date: October 17, 2024

Inventors: Bogdan Mihai Pasca, Martin Langhammer

1 2 3 4 5 … next