Patents by Inventor Martin Langhammer
Martin Langhammer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250251910Abstract: Integrated circuit devices, methods, and circuitry for an efficient multiplier are provided. Multiplier circuitry to multiply a multiplicand value with a multiplier value may include, among other things, input circuitry and carry-based coding circuitry. The input circuitry may receive the multiplicand value and the multiplier value. The carry-based coding circuitry may receive bits of the multiplier value and generate multiplication codes using a carry-based coding scheme that includes multiplication codes according to a Booth's coding scheme but with at least one multiplication code that is removed and replaced with another at least one multiplication code with a different value. A first encoder of the carry-based coding circuitry may receive a carry signal to adjust a multiplication code value of the first encoder based on a second encoder of the carry-based coding circuitry encoding the multiplication code with the different value.Type: ApplicationFiled: March 28, 2024Publication date: August 7, 2025Inventors: Igor Viktorovich Kucherenko, Bogdan Pasca, Martin Langhammer
-
Publication number: 20250238232Abstract: The present disclosure relates to an integrated circuit device that includes a plurality of vector registers configurable to store a plurality of vectors and switch circuitry communicatively coupled to the plurality of vector registers. The switch circuitry is configurable to route a portion of the plurality of vectors. Additionally, the integrated circuit device includes a plurality of vector processing units communicatively coupled to the switch circuitry. The plurality of vector processing units is configurable to receive the portion of the plurality of vectors, perform one or more operations involving the portion of the plurality of vector inputs, and output a second plurality of vectors generated by performing the one or more operations.Type: ApplicationFiled: March 5, 2025Publication date: July 24, 2025Inventors: Martin Langhammer, Eriko Nurvitadhi, Gregg William Baeckler
-
Publication number: 20250217109Abstract: Integrated circuit devices, methods, and circuitry for an efficient multiplier are provided. Multiplier circuitry to multiply a multiplicand value with a multiplier value may include, among other things, decoding circuitry, tripler circuitry, and partial product multiplexing circuitry. The decoding circuitry may decode bits of the multiplier value using a decoding scheme that includes at least a coding that indicates a triple, the tripler circuitry may generate a triple of the multiplicand value and may include circuitry to generate the triple of the multiplicand value that sums at least two different vectors, and the partial product multiplexing circuitry may select the triple of the multiplicand as a partial product when the coding indicates the triple.Type: ApplicationFiled: December 28, 2023Publication date: July 3, 2025Inventors: Igor Viktorovich Kucherenko, Martin Langhammer
-
Publication number: 20250208833Abstract: Techniques for handling block format floating point and/or integer numbers are described. In some examples, circuitry for handling block format floating point and/or integer numbers includes a plurality of multiplexers to select between the output of the mantissa multiplier circuits and outputs of the shift circuits to allow for support for block and non-block numbers.Type: ApplicationFiled: December 30, 2023Publication date: June 26, 2025Inventors: Martin Langhammer, Alexander Heinecke
-
Publication number: 20250208865Abstract: Techniques for converting block format numbers are described. In some examples, a single instruction is used that is to at least include fields for an opcode, a first source operand, and a destination operand, wherein the opcode is to at least indicate execution circuitry is to perform a conversion of one or more block format numbers associated with the first source operand that each have a value of a scale multiplied by a value of a data element to a non-block format and store the non-block formatted numbers in the destination operand.Type: ApplicationFiled: December 30, 2023Publication date: June 26, 2025Inventors: Alexander Heinecke, Martin Langhammer
-
Publication number: 20250208864Abstract: Techniques for dot products using block format numbers are described. In some examples, a single instruction including one or more fields for an identifier of at least a first source operand, one or more field for an identifier of a second source operand, and one or more fields for an identifier of a destination operand, and a field for an opcode, the opcode to at least indicate execution circuitry is to perform a dot product utilizing data that is in the block format to encode one or more numbers, wherein a block number of the block format has a value of a scale multiplied by a value of a scalar element and wherein the data that is in the block format is to use data from at least the first and second source operands is used for performing dot products.Type: ApplicationFiled: December 30, 2023Publication date: June 26, 2025Inventors: Alexander Heinecke, Martin Langhammer
-
Patent number: 12340219Abstract: The present disclosure describes a digital signal processing (DSP) block that includes a plurality of columns of weight registers and a plurality of inputs configured to receive a first plurality of values and a second plurality of values. The first plurality of values is stored in the plurality of columns of weight registers after being received. Additionally, the DSP block includes a plurality of multipliers configured to simultaneously multiply each value of the first plurality of values by each value of the second plurality of values.Type: GrantFiled: February 7, 2024Date of Patent: June 24, 2025Assignee: Altera CorporationInventors: Martin Langhammer, Dongdong Chen, Jason R. Bergendahl
-
Publication number: 20250199762Abstract: A programmable device may be configured to support machine learning training operations using matrix multiplication circuitry. In some embodiments, the multiplication is implemented on a systolic array. The systolic array includes an array of processing elements, each of which includes hybrid floating-point dot-product circuitry.Type: ApplicationFiled: March 3, 2025Publication date: June 19, 2025Inventors: Martin Langhammer, Bogdan Pasca, Sergey Gribok, Gregg William Baeckler, Andrei Hagiescu
-
Publication number: 20250190523Abstract: SoftMax operation is one part of a deep neural network (DNN). Because computing SoftMax is complex and time-consuming, the SoftMax operation can limit the overall execution latency of the DNN. To address this issue, an in-line data path is added to pass output data from a matrix-to-matrix multiplication core to a hardware SoftMax accelerator. During a denominator phase of the SoftMax operation, the SoftMax accelerator can operate in-line to produce a denominator value using output values generated by the matrix-to-matrix multiplication core and received over the in-line data path. During a numerator phase of the SoftMax operation, the SoftMax accelerator can calculate SoftMax outputs using output values generated by the matrix-to-matrix multiplication core and retrieved from a memory. In other words, the SoftMax accelerator can produce partial results while the matrix-to-matrix multiplication is in-flight to cut down overall latency and reduce memory transactions.Type: ApplicationFiled: February 18, 2025Publication date: June 12, 2025Applicant: Intel CorporationInventors: Kamlesh Pillai, Bogdan Pasca, Martin Langhammer
-
Publication number: 20250123804Abstract: Integrated circuits with dot product circuitry are provided. The dot product circuitry may be configured to generate partial products of different ranks based on the inputs. The partial products may be organized into corresponding groups based on their ranks. Each group of partial products having the same rank can then be compressed using a compressor/reduction tree. At least some of the compressed partial product values may be shifted between the different groups to maintain the proper offset. Each partial product may have an associated one's to two's complement conversion bit. The conversion bits of the various partial product groups can be separately aggregated and then injected into the compressor tree at one or more locations.Type: ApplicationFiled: December 26, 2024Publication date: April 17, 2025Inventor: Martin Langhammer
-
Patent number: 12254316Abstract: The present disclosure relates to an integrated circuit device that includes a plurality of vector registers configurable to store a plurality of vectors and switch circuitry communicatively coupled to the plurality of vector registers. The switch circuitry is configurable to route a portion of the plurality of vectors. Additionally, the integrated circuit device includes a plurality of vector processing units communicatively coupled to the switch circuitry. The plurality of vector processing units is configurable to receive the portion of the plurality of vectors, perform one or more operations involving the portion of the plurality of vector inputs, and output a second plurality of vectors generated by performing the one or more operations.Type: GrantFiled: March 26, 2021Date of Patent: March 18, 2025Assignee: Altera CorporationInventors: Martin Langhammer, Eriko Nurvitadhi, Gregg William Baeckler
-
Publication number: 20250060940Abstract: A data processing unit may include a memory, processing elements (PEs), and a control unit. The memory may store weight blocks within a weight tensor of a neural network operation. Each weight block has an input channel (IC) dimension and an output channel (OC) dimension and includes subblocks. A subblock includes one or more weights having a first data precision and one or more other weights having a second data precision. The second data precision is lower than the first data precision. The control unit may distribute different ones of the subblocks to different ones of the PEs. A PE may receive a subblock and perform a first MAC operation on a weight having a first data precision and a second MAC operation on a weight having a second data precision. The first MAC operation may consume more computation cycles or more multipliers than the second MAC operation.Type: ApplicationFiled: October 30, 2024Publication date: February 20, 2025Applicant: Intel CorporationInventors: Arnab Raha, Michael Wu, Deepak Abraham Mathaikutty, Daksha Sharma, Martin Langhammer
-
Publication number: 20250045017Abstract: Integrated circuit devices and circuitry for implementing and using efficient circuitry for summation of tensors having shared exponents and conversion into a floating-point format rae provided. Such circuitry may include first input circuitry to receive a first tensor in a fixed-point format having a first shared exponent and second input circuitry to receive a second tensor in the fixed-point format with a second shared exponent. Addition circuitry may add the first tensor and the second tensor, without first converting the first tensor and the second tensor to a floating-point format, to obtain a result in the floating-point format.Type: ApplicationFiled: September 27, 2024Publication date: February 6, 2025Inventors: Martin Langhammer, Bogdan Pasca, Dongdong Chen, Ilya Ganusov
-
Publication number: 20250021305Abstract: Integrated circuit devices, methods, and circuitry for implementing filters based on multipliers in tensor circuits are provided. Integrated circuitry may include a first tensor circuit with a first set of multipliers of a first precision and first summation circuitry and a second tensor circuit with a second set of multipliers of a second precision and second summation circuitry. The first tensor circuit and the second tensor circuit may collectively perform a multiplication operation at a third precision higher than the first precision and the second precision.Type: ApplicationFiled: September 27, 2024Publication date: January 16, 2025Inventors: Martin Langhammer, Volker Mauer, Gregory Ives, Dongdong Chen, Bogdan Pasca
-
Patent number: 12197888Abstract: Integrated circuits with dot product circuitry are provided. The dot product circuitry may be configured to generate partial products of different ranks based on the inputs. The partial products may be organized into corresponding groups based on their ranks. Each group of partial products having the same rank can then be compressed using a compressor/reduction tree. At least some of the compressed partial product values may be shifted between the different groups to maintain the proper offset. Each partial product may have an associated one's to two's complement conversion bit. The conversion bits of the various partial product groups can be separately aggregated and then injected into the compressor tree at one or more locations.Type: GrantFiled: December 23, 2019Date of Patent: January 14, 2025Assignee: Altera CorporationInventor: Martin Langhammer
-
Publication number: 20250013431Abstract: Integrated circuit devices, methods, and circuitry for implementing and using a hybrid modular multiplier circuit using a number of different modular reduction techniques are provided. Integrated circuitry may include multiplication circuitry to multiply an input multiplicand value with an input multiplier value to obtain a product, first coarse-grain modular reduction circuitry to partially reduce the product based on a modulus value using a first type of modular reduction, second coarse-grain modular reduction circuitry to further reduce the product based on the modulus value using a second type of modular reduction, and fine-grain modular reduction circuitry to finally reduce the product based on the modulus value using a third type of modular reduction to produce a final modular reduction result.Type: ApplicationFiled: September 26, 2024Publication date: January 9, 2025Inventors: Sergey Vladimirovich Gribok, Martin Langhammer, Bogdan Pasca
-
Patent number: 12182534Abstract: A digital signal processing (DSP) block includes a plurality of multipliers and a summation block separate from the plurality of multipliers. The DSP block is configurable to perform a first multiplication operation to determine a first product of a first floating-point value and a second floating-point value using only a first multiplier of the plurality of multipliers. Additionally, the DSP block is configurable to perform a second multiplication operation between a third floating-point value and a fourth floating-point value by receiving, at each of the plurality of multipliers, two integer values generated from the third floating-point value and the fourth floating-point value, generating, via the plurality of multipliers, a plurality of subproducts by multiplying, at each of the multipliers, the two integer values, and generating a second product of the second multiplication operation by adding, via the summation block, the plurality of subproducts.Type: GrantFiled: June 25, 2021Date of Patent: December 31, 2024Assignee: Intel CorporationInventor: Martin Langhammer
-
Publication number: 20240394448Abstract: A method for implementing a programmable device is provided. The method may include extracting an underlay from an existing routing network on the programmable device and then mapping a user design to the extracted underlay. The underlay may represent a subset of fast routing wires satisfying predetermined constraints. The underlay may be composed of multiple repeating adjacent logic blocks, each implementing some datapath reduction operation. Implementing circuit designs in this way can dramatically improve circuit performance while cutting down compile times by more than half.Type: ApplicationFiled: August 6, 2024Publication date: November 28, 2024Inventors: Gregg William Baeckler, Martin Langhammer
-
Patent number: 12135955Abstract: An integrated circuit device includes multiplier circuitry configured to determine a plurality of columns of subproducts by multiplying a plurality of values. Each column of the plurality of columns includes one or more subproducts of a plurality of subproducts. The integrated circuit device also includes adder circuitry configured to determine a plurality of sums, each sum being a sum of one column of the plurality of columns. A first portion of the adder circuitry associated with a first column of the plurality of columns is configured to receive a first value and second value that are associated with the first column and a third value associated with a second column of the plurality of columns that differs from the first column. The third value is a carry-out value generated by a second portion of the adder circuitry associated with the second column of the plurality of columns.Type: GrantFiled: December 24, 2020Date of Patent: November 5, 2024Assignee: Intel CorporationInventors: Martin Langhammer, Bogdan Mihai Pasca
-
Publication number: 20240345804Abstract: The present disclosure relates generally to techniques for adjusting the number representation (e.g., format) of a variable before and/or after performing one or more arithmetic operations on the variable. In particular, the present disclosure relates to scaling the range of a variable to a suitable representation based on available hardware (e.g., hard logic) in an integrated circuit device. For example, an input in a first number format (e.g., bfloat16) may be scaled to a second number format (e.g., half-precision floating-point) so that circuitry implemented to receive inputs in the second number format may perform one or more arithmetic operations on the input. Further, the output produced by the circuitry may be scaled back to the first number format. Accordingly, arithmetic operations, such as a dot-product, performed in a first format may be emulated by scaling the inputs to and/or the outputs from arithmetic operations performed in another format.Type: ApplicationFiled: June 26, 2024Publication date: October 17, 2024Inventors: Bogdan Mihai Pasca, Martin Langhammer