Patents by Inventor Martin Langhammer

Martin Langhammer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20200174750
    Abstract: Integrated circuits with specialized processing blocks are provided. The specialized processing blocks may include floating-point multiplier circuits that can be configured to support variable precision. A multiplier circuit may include a first carry-propagate adder (CPA), a second carry-propagate adder (CPA), and an associated rounding circuit. The first CPA may be wide enough to handle the required precision of the mantissa. In a bridged mode, the first CPA may borrow an additional bit from the second CPA while the rounding circuit will monitor the appropriate bits to select the proper multiplier output. A parallel prefix tree operable in a non-bridged mode or the bridged mode may be used to compute multiple multiplier outputs. The multiplier circuit may also include exponent and exception handling circuitry using various masks corresponding to the desired precision width.
    Type: Application
    Filed: February 10, 2020
    Publication date: June 4, 2020
    Inventor: Martin Langhammer
  • Patent number: 10649731
    Abstract: Integrated circuits with specialized processing blocks are provided. A specialized processing block may include one real addition stage and one real multiplier stage. The multiplier stage may simultaneously feed its output to the addition stage and directly to an adjacent specialized processing block. The addition stage may also produce sum and difference outputs in parallel. A group of four such specialized processing blocks may be connected in a chain to implement a radix-2 fast Fourier transform (FFT) butterfly. Multiple radix-2 butterflies may be stacked to form yet higher order radix butterflies. If desired, the specialized processing block may also be used to implement a complex multiply operation. Three or four specialized processing blocks may be chained together and along with one or more adders outside the specialized processing blocks, real and imaginary portions of a complex product can be generated.
    Type: Grant
    Filed: October 23, 2018
    Date of Patent: May 12, 2020
    Assignee: Altera Corporation
    Inventor: Martin Langhammer
  • Publication number: 20200142671
    Abstract: Integrated circuits with digital signal processing (DSP) blocks are provided. A DSP block may include one or more large multiplier circuits. A large multiplier circuit such as an 18×18 multiplier circuit may be used to support two or more smaller multiplication operations such as two 8×8 integer multiplications or two 9×9 integer multiplications. To implement the two 8×8 or 9×9 unsigned/signed multiplications, the 18×18 multiplier may be configured to support two 8×8 multiplications with one shared operand, two 6×6 multiplications without any shared operand, or two 7×7 multiplications without any shared operand. Any potential overlap of partial product terms may be subtracted out using correction logic. The multiplication of the remaining most significant bits can be computed using associated multiplier extension logic and appended to the other least significant bits using merging logic.
    Type: Application
    Filed: December 21, 2018
    Publication date: May 7, 2020
    Applicant: Intel Corporation
    Inventors: Bogdan Pasca, Martin Langhammer, Sergey Gribok, Gregg William Baeckler
  • Publication number: 20200125329
    Abstract: Integrated circuits with dot product circuitry are provided. The dot product circuitry may be configured to generate partial products of different ranks based on the inputs. The partial products may be organized into corresponding groups based on their ranks. Each group of partial products having the same rank can then be compressed using a compressor/reduction tree. At least some of the compressed partial product values may be shifted between the different groups to maintain the proper offset. Each partial product may have an associated one's to two's complement conversion bit. The conversion bits of the various partial product groups can be separately aggregated and then injected into the compressor tree at one or more locations.
    Type: Application
    Filed: December 23, 2019
    Publication date: April 23, 2020
    Applicant: Intel Corporation
    Inventor: Martin Langhammer
  • Publication number: 20200125780
    Abstract: Methods and apparatus for increasing the random logic utilization on a programmable device are provided. Although not completely homogeneous, the programmable device has many components that are repeated many times in an array. To help improve repeatability and packing, computer-aided design tools for compiling a circuit design for the programmable device may first lock down a synthesis cell netlist with stable naming, create location solution files (files with desired clustering granularity for stabilizing performance and reducing compile times) for selected regions of interest on the programmable device, and compose a final design with only the best solutions some of which can be imported from one location to another. Compiling a design in this way can help improve random logic utilization beyond 85% while improving circuit performance by 20% or more.
    Type: Application
    Filed: December 19, 2019
    Publication date: April 23, 2020
    Applicant: Intel Corporation
    Inventors: Gregg William Baeckler, Martin Langhammer
  • Patent number: 10613831
    Abstract: A specialized processing block on an integrated circuit includes a first and second arithmetic operator stage, an output coupled to another specialized processing block, and configurable interconnect circuitry which may be configured to route signals throughout the specialized processing block, including in and out of the first and second arithmetic operator stages. The configurable interconnect circuitry may further include multiplexer circuitry to route selected signals. The output of the specialized processing block that is coupled to another specialized processing block together with the configurable interconnect circuitry reduces the need to use resources outside the specialized processing block when implementing mathematical functions that require the use of more than one specialized processing block. An example for such mathematical functions include the implementation of scaled product sum operations and the implementation of Horner's rule.
    Type: Grant
    Filed: July 31, 2018
    Date of Patent: April 7, 2020
    Assignee: Altera Corporation
    Inventor: Martin Langhammer
  • Publication number: 20200106442
    Abstract: Integrated circuits with programmable logic regions are provided. The programmable logic regions may be organized into smaller logic units sometimes referred to as a logic element. A logic element may include four lookup tables coupled to an adder carry chain. At least some of the lookup tables are configured to output combinatorial outputs, whereas the adder carry chain are used to output sum outputs. Both the combinatorial outputs and the sum outputs may be used simultaneously to support a multiplication operation, three or more logic operations, or arithmetic and combinatorial operations in parallel.
    Type: Application
    Filed: September 27, 2018
    Publication date: April 2, 2020
    Applicant: Intel Corporation
    Inventors: Martin Langhammer, Sergey Gribok, Gregg William Baeckler
  • Patent number: 10574267
    Abstract: Syndrome calculation circuitry for a decoder of codewords having a first number of symbols, where the decoder receives a second number of parallel symbols, and where the first number is not evenly divisible by the second number, includes multipliers equal in number to the second number. Each multiplier multiplies a symbol by a coefficient based on a root of a field of the decoder. The multipliers are divided into a number of groups determined as a function of a modulus of the first number and the second number. Adders equal in number to the groups add outputs of multipliers in respective ones of the groups. Accumulation circuitry accumulates outputs of the adders. Output circuitry adds outputs of the adders to an output of the accumulation circuitry to provide a syndrome. Selection circuitry directs outputs of the adders to the accumulation circuitry or the output circuitry, and resets the accumulation circuitry.
    Type: Grant
    Filed: April 20, 2018
    Date of Patent: February 25, 2020
    Assignee: Altera Corporation
    Inventor: Martin Langhammer
  • Patent number: 10572222
    Abstract: Integrated circuits with specialized processing blocks are provided. The specialized processing blocks may include floating-point multiplier circuits that can be configured to support variable precision. A multiplier circuit may include a first carry-propagate adder (CPA), a second carry-propagate adder (CPA), and an associated rounding circuit. The first CPA may be wide enough to handle the required precision of the mantissa. In a bridged mode, the first CPA may borrow an additional bit from the second CPA while the rounding circuit will monitor the appropriate bits to select the proper multiplier output. A parallel prefix tree operable in a non-bridged mode or the bridged mode may be used to compute multiple multiplier outputs. The multiplier circuit may also include exponent and exception handling circuitry using various masks corresponding to the desired precision width.
    Type: Grant
    Filed: June 25, 2019
    Date of Patent: February 25, 2020
    Assignee: Altera Corporation
    Inventor: Martin Langhammer
  • Publication number: 20200026493
    Abstract: Configurable specialized processing blocks, such as DSP blocks, are described that implement fixed and floating-point functionality in a single mixed architecture on a programmable device. The described architecture reduces the need to construct floating-point functions outside the configurable specialized processing block, thereby minimizing hardware cost and area. The disclosed architecture also introduces pipelining into the DSP block in order to ensure the floating-point multiplication and addition functions remain in synchronicity, thereby increasing the maximum frequency at which the DSP block can operate. Moreover, the disclosed architecture includes logic circuitry to support floating-point exception handling.
    Type: Application
    Filed: September 27, 2019
    Publication date: January 23, 2020
    Inventors: Keone Streicher, Martin Langhammer, Yi-Wen Lin, Hyun Yi
  • Publication number: 20200026494
    Abstract: A programmable device may be configured to support machine learning training operations using matrix multiplication circuitry implemented on a systolic array. The systolic array includes an array of processing elements, each of which includes hybrid floating-point dot-product circuitry. The hybrid dot-product circuitry has a hard data path that uses digital signal processing (DSP) blocks operating in floating-point mode and a hard/soft data path that uses DSP blocks operating in fixed-point mode operated in conjunction with general purpose soft logic. The hard/soft data path includes 2-element dot-product circuits that feed an adder tree. Results from the hard data path are combined with the adder tree using format conversion and normalization circuitry. Inputs to the hybrid dot-product circuitry may be in the BFLOAT16 format. The hard data path may be in the single precision format. The hard/soft data path uses a custom format that is similar to but different than BFLOAT16.
    Type: Application
    Filed: September 27, 2019
    Publication date: January 23, 2020
    Applicant: Intel Corporation
    Inventors: Martin Langhammer, Bogdan Pasca, Sergey Gribok, Gregg William Baeckler, Andrei Hagiescu
  • Publication number: 20200004506
    Abstract: An integrated circuit may be provided with a modular multiplication circuit. The modular multiplication circuit may include an input multiplier for computing the product of two input signals, truncated multipliers for computing another product based on a modulus value and the product, a subtraction circuit for computing a difference between the two products. An error correction circuit may use the difference to look up an estimated quotient value and to subtract out an integer multiple of the modulus value from the difference in a single step, wherein the integer multiple is equal to the estimated quotient value. A final adjustment stage may be used to remove any remaining residual estimation error.
    Type: Application
    Filed: September 10, 2019
    Publication date: January 2, 2020
    Applicant: Intel Corporation
    Inventors: Martin Langhammer, Bogdan Pasca
  • Patent number: 10521227
    Abstract: The present embodiments relate to circuitry that efficiently performs double-precision floating-point addition operations, single-precision floating-point addition operations, and fixed-point addition operations. Such circuitry may be implemented in specialized processing blocks. If desired, each specialized processing block may efficiently perform a single-precision floating-point addition operation, and multiple specialized processing blocks may be coupled together to perform a double-precision floating-point addition operation. In some embodiments, four specialized processing blocks that are arranged in a one-way cascade chain may compute the sum of two double-precision floating-point number. If desired, two specialized processing blocks that are arranged in a two-way cascade chain may compute the sum of two double-precision floating-point numbers.
    Type: Grant
    Filed: July 23, 2018
    Date of Patent: December 31, 2019
    Assignee: Intel Corporation
    Inventor: Martin Langhammer
  • Patent number: 10521194
    Abstract: The present embodiments relate to integrated circuits with circuitry that efficiently performs mixed-precision floating-point arithmetic operations. Such circuitry may be implemented in specialized processing blocks. The specialized processing blocks may include configurable interconnect circuitry to support a variety of different use modes. For example, the specialized processing blocks may implement fixed-point addition, floating-point addition, fixed-point multiplication, floating-point multiplication, sum of two multiplications in a first floating-point precision, with or without casting to a second floating-point precision and the latter followed by a subsequent addition in the second floating-point precision, if desired, just to name a few.
    Type: Grant
    Filed: December 20, 2018
    Date of Patent: December 31, 2019
    Assignee: Intel Corporation
    Inventor: Martin Langhammer
  • Patent number: 10474429
    Abstract: Configurable specialized processing blocks, such as DSP blocks, are described that implement fixed and floating-point functionality in a single mixed architecture on a programmable device. The described architecture reduces the need to construct floating-point functions outside the configurable specialized processing block, thereby minimizing hardware cost and area. The disclosed architecture also introduces pipelining into the DSP block in order to ensure the floating-point multiplication and addition functions remain in synchronicity, thereby increasing the maximum frequency at which the DSP block can operate. Moreover, the disclosed architecture includes logic circuitry to support floating-point exception handling.
    Type: Grant
    Filed: October 21, 2016
    Date of Patent: November 12, 2019
    Assignee: Altera Corporation
    Inventors: Keone Streicher, Martin Langhammer, Yi-Wen Lin, Hyun Yi
  • Publication number: 20190324722
    Abstract: Integrated circuits with specialized processing blocks are provided. The specialized processing blocks may include floating-point multiplier circuits that can be configured to support variable precision. A multiplier circuit may include a first carry-propagate adder (CPA), a second carry-propagate adder (CPA), and an associated rounding circuit. The first CPA may be wide enough to handle the required precision of the mantissa. In a bridged mode, the first CPA may borrow an additional bit from the second CPA while the rounding circuit will monitor the appropriate bits to select the proper multiplier output. A parallel prefix tree operable in a non-bridged mode or the bridged mode may be used to compute multiple multiplier outputs. The multiplier circuit may also include exponent and exception handling circuitry using various masks corresponding to the desired precision width.
    Type: Application
    Filed: June 25, 2019
    Publication date: October 24, 2019
    Inventor: Martin Langhammer
  • Publication number: 20190324724
    Abstract: A computer-implemented method for programming an integrated circuit includes receiving a program design and determining one or more addition operations based on the program design. The method also includes performing geometric synthesis based on the one or more addition operations by determining a plurality of bits associated with the one or more addition operations and defining a plurality of counters that includes the plurality of bits. Furthermore, the method includes generating instructions configured to cause circuitry configured to perform the one or more addition operations to be implemented on the integrated circuit based on the plurality of counters. The circuitry includes first adder circuitry configured to add a portion of the plurality of bits and produce a carry-out value. The circuitry also includes second adder circuitry configured to determine a sum of a second portion of the plurality of bits and the carry-out value.
    Type: Application
    Filed: June 28, 2019
    Publication date: October 24, 2019
    Inventors: Sergey Vladimirovich Gribok, Gregg William Baeckler, Martin Langhammer
  • Publication number: 20190318058
    Abstract: Techniques for designing and implementing networks-on-chip (NoCs) are provided. For example, a computer-implemented method for programming a network-on-chip (NoC) onto an integrated circuit includes determining a first portion of a plurality of registers to potentially be included in a NoC design, determining routing information regarding datapaths between registers of the first portion of the plurality of registers, and determining an expected performance associated with the first portion of the plurality of registers. The method also includes determining whether the expected performance is within a threshold range, including the first portion of the plurality of registers and the datapaths in the NoC design after determining that the expected performance is within the threshold range, and generating instructions configured to cause circuitry corresponding to the NoC design to be implemented on the integrated circuit.
    Type: Application
    Filed: June 28, 2019
    Publication date: October 17, 2019
    Inventors: Gregg William Baeckler, Martin Langhammer, Sergey Vladimirovich Gribok
  • Publication number: 20190310828
    Abstract: An integrated circuit with a large multiplier is provided. The multiplier may be configured to receive large input operands with thousands of bits. The multiplier may be implemented using a multiplier decomposition scheme that is recursively flattened into multiple decomposition levels to expose a tree of adders. The adders may be collapsed into a merged pipelined structure, where partial sums are forwarded from one level to the next while bypassing intervening prefix networks. The final correct sum is not calculated until later. In accordance with the decomposition technique, the partial sums are successively halved, which allows the prefix networks to be smaller from one level to the next. This allows all sums to be calculated at approximately the same pipeline depth, which significantly reduces latency with no or limited pipeline balancing.
    Type: Application
    Filed: June 24, 2019
    Publication date: October 10, 2019
    Applicant: Intel Corporation
    Inventors: Martin Langhammer, Bogdan Pasca
  • Publication number: 20190288688
    Abstract: Integrated circuits with programmable logic regions are provided. The programmable logic regions may be organized into smaller logic units sometimes referred to as a logic cell. A logic cell may include four 4-input lookup tables (LUTs) coupled to an adder carry chain. Each of the four 4-input LUTs may include two 3-input LUTs and a selector multiplexer. The carry chain may include at three or more full adder circuits. The outputs of the 3-input LUTs may be directly connected to inputs of the full adder circuits in the carry chain. By providing at least the same or more number of full adder circuits as the total number of 4-input LUTs in the logic cell, the arithmetic density of the logic is enhanced.
    Type: Application
    Filed: June 6, 2019
    Publication date: September 19, 2019
    Applicant: Intel Corporation
    Inventors: Sergey Gribok, Gregg Baeckler, Martin Langhammer