Patents by Inventor Martin Langhammer

Martin Langhammer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Integrated circuits with machine learning extensions

Patent number: 10970042

Abstract: An integrated circuit with specialized processing blocks is provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.

Type: Grant

Filed: September 27, 2018

Date of Patent: April 6, 2021

Assignee: Intel Corporation

Inventors: Martin Langhammer, Dongdong Chen, Kevin Hurd
ADDER CIRCUITRY FOR VERY LARGE INTEGERS

Publication number: 20210075425

Abstract: An integrated circuit that includes very large adder circuitry is provided. The very large adder circuitry receives more than two inputs each of which has hundreds or thousands of bits. The very large adder circuitry includes multiple adder nodes arranged in a tree-like network. The adder nodes divide the input operands into segments, computes the sum for each segment, and computes the carry for each segment independently from the segment sums. The carries at each level in the tree are accumulated using population counters. After the last node in the tree, the segment sums can then be combined with the carries to determine the final sum output. An adder tree network implemented in this way asymptotically approaches the area and performance latency as an adder network that uses infinite speed ripple carry adders.

Type: Application

Filed: November 19, 2020

Publication date: March 11, 2021

Applicant: Intel Corporation

Inventor: Martin Langhammer
Implementation of floating-point trigonometric functions in an integrated circuit device

Patent number: 10942706

Abstract: The present embodiments relate to integrated circuits with circuitry that implements floating-point trigonometric functions. The circuitry may include an approximation circuit that generates an approximation of the output of the trigonometric functions, a storage circuit that stores predetermined output values of the trigonometric functions, and a selector circuit that selects between different possible output values based on a control signal from a control circuit. In some embodiments, the circuitry may include a mapping circuit and a restoration circuit. The mapping circuit may map an input value from an original quadrant of the trigonometric circle to a predetermined input interval, and the restoration circuit may map the output value selected by the selection circuit back to the original quadrant of the trigonometric circle. If desired, the circuitry may be implemented in specialized processing blocks.

Type: Grant

Filed: June 27, 2017

Date of Patent: March 9, 2021

Assignee: Intel Corporation

Inventors: Martin Langhammer, Bogdan Pasca
High performance regularized network-on-chip architecture

Patent number: 10922471

Abstract: Techniques for designing and implementing networks-on-chip (NoCs) are provided. For example, a computer-implemented method for programming a network-on-chip (NoC) onto an integrated circuit includes determining a first portion of a plurality of registers to potentially be included in a NoC design, determining routing information regarding datapaths between registers of the first portion of the plurality of registers, and determining an expected performance associated with the first portion of the plurality of registers. The method also includes determining whether the expected performance is within a threshold range, including the first portion of the plurality of registers and the datapaths in the NoC design after determining that the expected performance is within the threshold range, and generating instructions configured to cause circuitry corresponding to the NoC design to be implemented on the integrated circuit.

Type: Grant

Filed: June 28, 2019

Date of Patent: February 16, 2021

Assignee: Intel Corporation

Inventors: Gregg William Baeckler, Martin Langhammer, Sergey Vladimirovich Gribok
Adder circuitry for very large integers

Patent number: 10873332

Abstract: An integrated circuit that includes very large adder circuitry is provided. The very large adder circuitry receives more than two inputs each of which has hundreds or thousands of bits. The very large adder circuitry includes multiple adder nodes arranged in a tree-like network. The adder nodes divide the input operands into segments, computes the sum for each segment, and computes the carry for each segment independently from the segment sums. The carries at each level in the tree are accumulated using population counters. After the last node in the tree, the segment sums can then be combined with the carries to determine the final sum output. An adder tree network implemented in this way asymptotically approaches the area and performance latency as an adder network that uses infinite speed ripple carry adders.

Type: Grant

Filed: November 30, 2018

Date of Patent: December 22, 2020

Assignee: Intel Corporation

Inventor: Martin Langhammer
High performance QR decomposition systems and methods

Patent number: 10872130

Abstract: Based on a Modified Gram-Schmidt (MGS) algorithm, QR decomposition techniques are optimized for parallel structures that provide arithmetic-logic unit (ALU) to ALU connectivity. The techniques utilize a different loop organization, but the dependent functional sequences of the algorithm are unchanged, thereby reducing likelihood of affecting error analysis and/or numerical stability. Some integrated circuit devices (e.g., FPGA) may implement hard floating-point (HFP) circuitry, such as a digital signal processing (DSP) block, distributed memories, and/or flexible internal connectivity, which can support the discussed high performance matrix arithmetic.

Type: Grant

Filed: August 31, 2017

Date of Patent: December 22, 2020

Assignee: Intel Corporation

Inventor: Martin Langhammer
Methods for using a multiplier to support multiple sub-multiplication operations

Patent number: 10871946

Abstract: Integrated circuits with digital signal processing (DSP) blocks are provided. A DSP block may include one or more large multiplier circuits. A large multiplier circuit (e.g., an 18×18 or 18×19 multiplier circuit) may be used to support two or more smaller multiplication operations sharing one or two sets of multiplier operands, a complex multiplication, and a sum of two multiplications. If the multiplier products overflow and interfere with one another, correction operations can be performed. Partial products from two or more larger multiplier circuits can be used to combine decomposed partial products. A large multiplier circuit can also be used to support two floating-point mantissa multipliers.

Type: Grant

Filed: September 27, 2018

Date of Patent: December 22, 2020

Assignee: Intel Corporation

Inventors: Martin Langhammer, Gregg William Baeckler, Sergey Gribok, Dmitry N. Denisenko, Bogdan Pasca
Method and apparatus for implementing an application aware system on a programmable logic device

Patent number: 10867090

Abstract: A method for designing a system on a target device is disclosed. The system is synthesized from a register transfer level description. The system is placed on the target device. The system is routed on the target device. A configuration file is generated that reflects the synthesizing, placing, and routing of the system for programming the target device. A modification for the system is identified. The configuration file is modified to effectuate the modification for the system without changing the placing and routing of the system.

Type: Grant

Filed: March 18, 2019

Date of Patent: December 15, 2020

Assignee: Intel Corporation

Inventors: Gregg William Baeckler, Martin Langhammer, Sergey Gribok, Scott J. Weber, Gregory Steinke
Fixed-point and floating-point arithmetic operator circuits in specialized processing blocks

Patent number: 10838695

Abstract: The present embodiments relate to circuitry that efficiently performs floating-point arithmetic operations and fixed-point arithmetic operations. Such circuitry may be implemented in specialized processing blocks. If desired, the specialized processing blocks may include configurable interconnect circuitry to support a variety of different use modes. For example, the specialized processing block may efficiently perform a fixed-point or floating-point addition operation or a portion thereof, a fixed-point or floating-point multiplication operation or a portion thereof, a fixed-point or floating-point multiply-add operation or a portion thereof, just to name a few. In some embodiments, two or more specialized processing blocks may be arranged in a cascade chain and perform together more complex operations such as a recursive mode dot product of two vectors of floating-point numbers or a Radix-2 Butterfly circuit, just to name a few.

Type: Grant

Filed: June 4, 2019

Date of Patent: November 17, 2020

Assignee: Altera Corporation

Inventor: Martin Langhammer
FPGA Specialist Processing Block for Machine Learning

Publication number: 20200327271

Abstract: The present disclosure describes a digital signal processing (DSP) block that includes a plurality of columns of weight registers and a plurality of inputs configured to receive a first plurality of values and a second plurality of values. The first plurality of values is stored in the plurality of columns of weight registers after being received. Additionally, the DSP block includes a plurality of multipliers configured to simultaneously multiply each value of the first plurality of values by each value of the second plurality of values.

Type: Application

Filed: June 26, 2020

Publication date: October 15, 2020

Inventors: Martin Langhammer, Dongdong Chen, Jason R. Bergendahl
Systems and Methods for Loading Weights into a Tensor Processing Block

Publication number: 20200326948

Abstract: The present disclosure describes a digital signal processing (DSP) block that includes a plurality of columns of weight registers and a plurality of inputs configured to receive a first plurality of values and a second plurality of values. The first plurality of values is stored in the plurality of columns of weight registers after being received. In a first mode of operation, the first and second pluralities of values are received via a first portion of the plurality of inputs. In a second mode of operation, the first plurality of values is received via a second portion of the plurality of inputs, and the second plurality of values is received via the first portion of the plurality of inputs. Additionally, the DSP block includes a plurality of multipliers configured to simultaneously multiply each value of the first plurality of values by each value of the second plurality of values.

Type: Application

Filed: June 26, 2020

Publication date: October 15, 2020

Inventor: Martin Langhammer
Logic circuits with simultaneous dual function capability

Patent number: 10790829

Abstract: Integrated circuits with programmable logic regions are provided. The programmable logic regions may be organized into smaller logic units sometimes referred to as a logic element. A logic element may include four lookup tables coupled to an adder carry chain. At least some of the lookup tables are configured to output combinatorial outputs, whereas the adder carry chain are used to output sum outputs. Both the combinatorial outputs and the sum outputs may be used simultaneously to support a multiplication operation, three or more logic operations, or arithmetic and combinatorial operations in parallel.

Type: Grant

Filed: September 27, 2018

Date of Patent: September 29, 2020

Assignee: Intel Corporation

Inventors: Martin Langhammer, Sergey Gribok, Gregg William Baeckler
PROGRAMMABLE INTEGRATED CIRCUIT UNDERLAY

Publication number: 20200293707

Abstract: A method for implementing a programmable device is provided. The method may include extracting an underlay from an existing routing network on the programmable device and then mapping a user design to the extracted underlay. The underlay may represent a subset of fast routing wires satisfying predetermined constraints. The underlay may be composed of multiple repeating adjacent logic blocks, each implementing some datapath reduction operation. Implementing circuit designs in this way can dramatically improve circuit performance while cutting down compile times by more than half.

Type: Application

Filed: June 1, 2020

Publication date: September 17, 2020

Applicant: Intel Corporation

Inventors: Gregg William Baeckler, Martin Langhammer
Hazard Mitigation for Lightweight Processor Cores

Publication number: 20200278865

Abstract: Integrated circuits that include lightweight processor cores are provided. Each processor core may be configured to execute a series of instructions. At least one of the instructions may include an embedded delay field with a value specifying the amount of time that instruction needs to wait before proceeding to the next instruction to avoid a data hazard. The value of the delay field may be determined by a compiler during software compile time. Such delay field may also be used in conjunction with branch instructions to specify a number of no-operations (NOPs) for one or more associated branch delay slots and may also be used to reduce data forwarding cost.

Type: Application

Filed: May 15, 2020

Publication date: September 3, 2020

Applicant: Intel Corporation

Inventors: Martin Langhammer, Gregg William Baeckler
Reduced floating-point precision arithmetic circuitry

Patent number: 10761805

Abstract: The present embodiments relate to performing reduced-precision floating-point arithmetic operations using specialized processing blocks with higher-precision floating-point arithmetic circuitry. A specialized processing block may receive four floating-point numbers that represent two single-precision floating-point numbers, each separated into an LSB portion and an MSB portion, or four half-precision floating-point numbers. A first partial product generator may generate a first partial product of first and second input signals, while a second partial product generator may generate a second partial product of third and fourth input signals.

Type: Grant

Filed: September 26, 2018

Date of Patent: September 1, 2020

Assignee: Altera Corporation

Inventor: Martin Langhammer
Methods for using a multiplier circuit to support multiple sub-multiplications using bit correction and extension

Patent number: 10732932

Abstract: Integrated circuits with digital signal processing (DSP) blocks are provided. A DSP block may include one or more large multiplier circuits. A large multiplier circuit such as an 18×18 multiplier circuit may be used to support two or more smaller multiplication operations such as two 8×8 integer multiplications or two 9×9 integer multiplications. To implement the two 8×8 or 9×9 unsigned/signed multiplications, the 18×18 multiplier may be configured to support two 8×8 multiplications with one shared operand, two 6×6 multiplications without any shared operand, or two 7×7 multiplications without any shared operand. Any potential overlap of partial product terms may be subtracted out using correction logic. The multiplication of the remaining most significant bits can be computed using associated multiplier extension logic and appended to the other least significant bits using merging logic.

Type: Grant

Filed: December 21, 2018

Date of Patent: August 4, 2020

Assignee: Intel Corporation

Inventors: Bogdan Pasca, Martin Langhammer, Sergey Gribok, Gregg William Baeckler
Logic circuits with augmented arithmetic densities

Patent number: 10715144

Abstract: Integrated circuits with programmable logic regions are provided. The programmable logic regions may be organized into smaller logic units sometimes referred to as a logic cell. A logic cell may include four 4-input lookup tables (LUTs) coupled to an adder carry chain. Each of the four 4-input LUTs may include two 3-input LUTs and a selector multiplexer. The carry chain may include at three or more full adder circuits. The outputs of the 3-input LUTs may be directly connected to inputs of the full adder circuits in the carry chain. By providing at least the same or more number of full adder circuits as the total number of 4-input LUTs in the logic cell, the arithmetic density of the logic is enhanced.

Type: Grant

Filed: June 6, 2019

Date of Patent: July 14, 2020

Assignee: Intel Corporation

Inventors: Sergey Gribok, Gregg Baeckler, Martin Langhammer
Denormalization in multi-precision floating-point arithmetic circuitry

Patent number: 10678510

Abstract: The present embodiments relate to integrated circuits with floating-point arithmetic circuitry that handles normalized and denormalized floating-point numbers. The floating-point arithmetic circuitry may include a normalization circuit and a rounding circuit, and the floating-point arithmetic circuitry may generate a first result in form of a normalized, unrounded floating-point number and a second result in form of a normalized, rounded floating-point number. If desired, the floating-point arithmetic circuitry may be implemented in specialized processing blocks.

Type: Grant

Filed: September 25, 2017

Date of Patent: June 9, 2020

Assignee: Altera Corporation

Inventor: Martin Langhammer
VARIABLE PRECISION FLOATING-POINT MULTIPLIER

Publication number: 20200174750

Abstract: Integrated circuits with specialized processing blocks are provided. The specialized processing blocks may include floating-point multiplier circuits that can be configured to support variable precision. A multiplier circuit may include a first carry-propagate adder (CPA), a second carry-propagate adder (CPA), and an associated rounding circuit. The first CPA may be wide enough to handle the required precision of the mantissa. In a bridged mode, the first CPA may borrow an additional bit from the second CPA while the rounding circuit will monitor the appropriate bits to select the proper multiplier output. A parallel prefix tree operable in a non-bridged mode or the bridged mode may be used to compute multiple multiplier outputs. The multiplier circuit may also include exponent and exception handling circuitry using various masks corresponding to the desired precision width.

Type: Application

Filed: February 10, 2020

Publication date: June 4, 2020

Inventor: Martin Langhammer
Integrated circuits with specialized processing blocks for performing floating-point fast fourier transforms and complex multiplication

Patent number: 10649731

Abstract: Integrated circuits with specialized processing blocks are provided. A specialized processing block may include one real addition stage and one real multiplier stage. The multiplier stage may simultaneously feed its output to the addition stage and directly to an adjacent specialized processing block. The addition stage may also produce sum and difference outputs in parallel. A group of four such specialized processing blocks may be connected in a chain to implement a radix-2 fast Fourier transform (FFT) butterfly. Multiple radix-2 butterflies may be stacked to form yet higher order radix butterflies. If desired, the specialized processing block may also be used to implement a complex multiply operation. Three or four specialized processing blocks may be chained together and along with one or more adders outside the specialized processing blocks, real and imaginary portions of a complex product can be generated.

Type: Grant

Filed: October 23, 2018

Date of Patent: May 12, 2020

Assignee: Altera Corporation

Inventor: Martin Langhammer

prev 1 2 3 4 5 6 7 8 9 … next