Patents by Inventor Martin Langhammer

Martin Langhammer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10417004
    Abstract: Circuitry operating under a floating-point mode or a fixed-point mode includes a first circuit accepting a first data input and generating a first data output. The first circuit includes a first arithmetic element accepting the first data input, a plurality of pipeline registers disposed in connection with the first arithmetic element, and a cascade register that outputs the first data output. The circuitry further includes a second circuit accepting a second data input and generating a second data output. The second circuit is cascaded to the first circuit such that the first data output is connected to the second data input via the cascade register. The cascade register is selectively bypassed when the first circuit is operated under the fixed-point mode.
    Type: Grant
    Filed: June 28, 2017
    Date of Patent: September 17, 2019
    Assignee: Altera Corporation
    Inventor: Martin Langhammer
  • Publication number: 20190250886
    Abstract: The present embodiments relate to integrated circuits with floating-point arithmetic circuitry that handles normalized and denormalized floating-point numbers. The floating-point arithmetic circuitry may include a normalization circuit and a rounding circuit, and the floating-point arithmetic circuitry may generate a first result in form of a normalized, unrounded floating-point number and a second result in form of a normalized, rounded floating-point number. If desired, the floating-point arithmetic circuitry may be implemented in specialized processing blocks.
    Type: Application
    Filed: September 25, 2017
    Publication date: August 15, 2019
    Applicant: Altera Corporation
    Inventor: Martin Langhammer
  • Patent number: 10379815
    Abstract: Integrated circuits with specialized processing blocks are provided. The specialized processing blocks may include floating-point multiplier circuits that can be configured to support variable precision. A multiplier circuit may include a first carry-propagate adder (CPA), a second carry-propagate adder (CPA), and an associated rounding circuit. The first CPA may be wide enough to handle the required precision of the mantissa. In a bridged mode, the first CPA may borrow an additional bit from the second CPA while the rounding circuit will monitor the appropriate bits to select the proper multiplier output. A parallel prefix tree operable in a non-bridged mode or the bridged mode may be used to compute multiple multiplier outputs. The multiplier circuit may also include exponent and exception handling circuitry using various masks corresponding to the desired precision width.
    Type: Grant
    Filed: July 18, 2018
    Date of Patent: August 13, 2019
    Assignee: Altera Corporation
    Inventor: Martin Langhammer
  • Patent number: 10374636
    Abstract: One embodiment relates a method of receiving data from a multi-lane data link. The data is encoded with an FEC code having a block length. The data is FEC encoded at a bus width which is specified within particular constraints. One constraint is that the FEC encoder bus width in bits is an exact multiple of a number of bits per symbol in the data. Another constraint may be that the FEC code block length is an exact multiple of the FEC encoder bus width. Another constraint may be that the FEC encoder bus width is an exact multiple of a number of serial lanes of the multi-lane interface. Other embodiments and features are also disclosed.
    Type: Grant
    Filed: April 4, 2016
    Date of Patent: August 6, 2019
    Assignee: Altera Corporation
    Inventors: Haiyun Yang, Martin Langhammer, Peng Li, Divya Vijayaraghavan
  • Publication number: 20190228051
    Abstract: The present disclosure relates generally to techniques for efficiently performing operations associated with artificial intelligence (AI), machine learning (ML), and/or deep learning (DL) applications, such as training and/or interference calculations, using an integrated circuit device. More specifically, the present disclosure relates to an integrated circuit design implemented to perform these operations with low latency and/or a high bandwidth of data. For example, embodiments of a computationally dense digital signal processing (DSP) circuitry, implemented to efficiently perform one or more arithmetic operations (e.g., a dot-product) on an input are disclosed. Moreover, embodiments described herein may relate to layout, design, and data scheduling of a processing element array implemented to compute matrix multiplications (e.g., systolic array multiplication).
    Type: Application
    Filed: March 29, 2019
    Publication date: July 25, 2019
    Inventors: Martin Langhammer, Andrei-Mihai Hagiescu-Miriste
  • Publication number: 20190213289
    Abstract: A method for designing a system on a target device is disclosed. The system is synthesized from a register transfer level description. The system is placed on the target device. The system is routed on the target device. A configuration file is generated that reflects the synthesizing, placing, and routing of the system for programming the target device. A modification for the system is identified. The configuration file is modified to effectuate the modification for the system without changing the placing and routing of the system.
    Type: Application
    Filed: March 18, 2019
    Publication date: July 11, 2019
    Inventors: Gregg William BAECKLER, Martin LANGHAMMER, Sergey GRIBOK, Scott J. WEBER, Gregory STEINKE
  • Patent number: 10340920
    Abstract: The present disclosure relates generally to techniques for enhancing adders implemented on an integrated circuit. In particular, arithmetic performed by an adder implemented to receive operands having a first precision may be restructured so that a set of sub-adders may perform the arithmetic on a respective segment of the operands. More specifically, the adder may be restructured so that a sub-adder of the set of sub-adders may concurrently output a generate signal and a propagate signal, which may both be routed to a prefix network. The prefix network may determine respective carry bit(s), which may carry into and/or select a sum at a subsequent sub-adder of the restructured adder. As a result, the integrated circuit may benefit from increased efficiencies, reduced latency, and reduced resource consumption (e.g., area and/or power) involved with implementing addition, which may improve operations such as encryption or machine learning on the integrated circuit.
    Type: Grant
    Filed: September 28, 2018
    Date of Patent: July 2, 2019
    Assignee: INTEL CORPORATION
    Inventors: Martin Langhammer, Tim Michael Vanderhoek, Jeffery Christopher Chromczak, Trevis Chandler
  • Publication number: 20190196786
    Abstract: The present embodiments relate to integrated circuits with circuitry that efficiently performs mixed-precision floating-point arithmetic operations. Such circuitry may be implemented in specialized processing blocks. The specialized processing blocks may include configurable interconnect circuitry to support a variety of different use modes. For example, the specialized processing blocks may implement fixed-point addition, floating-point addition, fixed-point multiplication, floating-point multiplication, sum of two multiplications in a first floating-point precision, with or without casting to a second floating-point precision and the latter followed by a subsequent addition in the second floating-point precision, if desired, just to name a few.
    Type: Application
    Filed: December 20, 2018
    Publication date: June 27, 2019
    Inventor: Martin Langhammer
  • Publication number: 20190190653
    Abstract: Network communication systems may employ coding schemes to provide error checking and/or error correction. Such schemes may include parity or check symbols in a message that may add redundancy, which may be used to check for errors. For example, Ethernet may employ forward error correction (FEC) schemes using Reed-Solomon codes. An increase in the number of parity symbols may increase the power of the error-correcting scheme, but may lead to an increased in latencies. Encoders and decoders that may be configured in a manner to produce variable-length messages while preserving compatibility with network standards are described. Decoders described herein may be able to verify long codewords by checking short codes and integrating the results. Encoders described herein may be able to generate codewords in multiple formats without replicating large segments of the circuitry.
    Type: Application
    Filed: December 14, 2017
    Publication date: June 20, 2019
    Inventors: Martin Langhammer, Mike Peng Li, Masashi Shimanouchi
  • Patent number: 10318241
    Abstract: The present embodiments relate to circuitry that efficiently performs floating-point arithmetic operations and fixed-point arithmetic operations. Such circuitry may be implemented in specialized processing blocks. If desired, the specialized processing blocks may include configurable interconnect circuitry to support a variety of different use modes. For example, the specialized processing block may efficiently perform a fixed-point or floating-point addition operation or a portion thereof, a fixed-point or floating-point multiplication operation or a portion thereof, a fixed-point or floating-point multiply-add operation or a portion thereof, just to name a few. In some embodiments, two or more specialized processing blocks may be arranged in a cascade chain and perform together more complex operations such as a recursive mode dot product of two vectors of floating-point numbers or a Radix-2 Butterfly circuit, just to name a few.
    Type: Grant
    Filed: August 6, 2018
    Date of Patent: June 11, 2019
    Assignee: Altera Corporation
    Inventor: Martin Langhammer
  • Publication number: 20190155574
    Abstract: An integrated circuit with specialized processing blocks is provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.
    Type: Application
    Filed: September 27, 2018
    Publication date: May 23, 2019
    Applicant: Intel Corporation
    Inventors: Martin Langhammer, Dongdong Chen, Kevin Hurd
  • Publication number: 20190155575
    Abstract: An integrated circuit with specialized processing blocks are provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.
    Type: Application
    Filed: November 20, 2017
    Publication date: May 23, 2019
    Applicant: Intel Corporation
    Inventors: Martin Langhammer, Dongdong Chen
  • Publication number: 20190121614
    Abstract: Integrated circuits with specialized processing blocks are provided. A specialized processing block may include one real addition stage and one real multiplier stage. The multiplier stage may simultaneously feed its output to the addition stage and directly to an adjacent specialized processing block. The addition stage may also produce sum and difference outputs in parallel. A group of four such specialized processing blocks may be connected in a chain to implement a radix-2 fast Fourier transform (FFT) butterfly. Multiple radix-2 butterflies may be stacked to form yet higher order radix butterflies. If desired, the specialized processing block may also be used to implement a complex multiply operation. Three or four specialized processing blocks may be chained together and along with one or more adders outside the specialized processing blocks, real and imaginary portions of a complex product can be generated.
    Type: Application
    Filed: October 23, 2018
    Publication date: April 25, 2019
    Inventor: Martin Langhammer
  • Publication number: 20190121927
    Abstract: A method for implementing a multiplier on a programmable logic device (PLD) is disclosed. Partial product bits of the multiplier are identified and how the partial product bits are to be summed to generate a final product from a multiplier and multiplicand are determined. Chains of PLD cells and cells in the chains of PLD cells for generating and summing the partial product bits are assigned. It is determined whether a bit in an assigned cell in an assigned chain of PLD cells is under-utilized. In response to determining that a bit is under-utilized, the assigning of the chains of PLD cells and cells for generating and summing the partial product bits are changed to improve an overall utilization of the chains of PLD cells and cells in the chains of PLD cells.
    Type: Application
    Filed: December 12, 2018
    Publication date: April 25, 2019
    Inventors: Martin LANGHAMMER, Sergey GRIBOK, Gregg William BAECKLER
  • Publication number: 20190114140
    Abstract: An integrated circuit that includes very large adder circuitry is provided. The very large adder circuitry receives more than two inputs each of which has hundreds or thousands of bits. The very large adder circuitry includes multiple adder nodes arranged in a tree-like network. The adder nodes divide the input operands into segments, computes the sum for each segment, and computes the carry for each segment independently from the segment sums. The carries at each level in the tree are accumulated using population counters. After the last node in the tree, the segment sums can then be combined with the carries to determine the final sum output. An adder tree network implemented in this way asymptotically approaches the area and performance latency as an adder network that uses infinite speed ripple carry adders.
    Type: Application
    Filed: November 30, 2018
    Publication date: April 18, 2019
    Applicant: Intel Corporation
    Inventor: Martin Langhammer
  • Patent number: 10237066
    Abstract: A scalable and efficient cryptographic architecture is provided for processing data using deeply-pipelined algorithms and circuitries. The architecture can be implemented as circuitry in a fixed logic device, or can be configured into a programmable integrated circuit device. The same top-level design may be used for different choices of data channels, processing depth, parallelism level, and/or system throughput. An encryption pipeline processing block performs rounds of processing upon a block of said data using an encryption process and receives a respective round encryption key for each round of processing. An encryption key pipeline block provides the respective round encryption key for each round of processing by selecting, for each round of processing, the respective round encryption key from at least a first round encryption key corresponding to a first channel and a second round encryption key corresponding to a second channel.
    Type: Grant
    Filed: April 8, 2014
    Date of Patent: March 19, 2019
    Assignee: ALTERA CORPORATION
    Inventors: Martin Langhammer, Shawn Nicholl, Cheng Wang
  • Publication number: 20190079728
    Abstract: An integrated circuit may include a floating-point adder. The adder may be implemented using a dual-path adder architecture having a near path and a far path. The near path may include a leading zero anticipator (LZA), a comparison circuit for comparing an exponent value to an LZA count, and associated circuitry for handling subnormal numbers. The far path may include a subtraction circuit for computing the difference between a received exponent value and a minimum exponent value, at least two shifters for shifting far greater and far lesser mantissa values in parallel, and associated circuitry for handling subnormal numbers. The adder may be dynamically configured to support a first mode that processes FP16 at inputs and outputs, a second mode that processes modified FP16? inputs, and a third mode that processes FP16? at inputs and outputs.
    Type: Application
    Filed: September 14, 2017
    Publication date: March 14, 2019
    Applicant: Intel Corporation
    Inventors: Martin Langhammer, Bogdan Pasca
  • Patent number: 10230399
    Abstract: Decoder circuitry for an input channel having a data rate, where a codeword on the input channel includes a plurality of symbols, includes options to provide a first output channel having that data rate, and a plurality of second output channels having slower data rates. The decoder circuitry includes syndrome calculation circuitry, polynomial calculation circuitry, and search-and-correct circuitry. The syndrome calculation circuitry includes finite-field multipliers for multiplying each symbol by a power of a root of the field. Each multiplier other than a first multiplier multiplies a symbol by a higher power of the root than an adjacent multiplier. First-level adders add outputs of a number of groups of multipliers. A second-level adder adds outputs of the first-level adders to be accumulated as syndromes of the first output channel. Another plurality of accumulators accumulates outputs of the first-level adders, which after scaling, are syndromes of the second output channels.
    Type: Grant
    Filed: January 4, 2016
    Date of Patent: March 12, 2019
    Assignee: Altera Corporation
    Inventor: Martin Langhammer
  • Publication number: 20190065188
    Abstract: An accelerated processor structure on a programmable integrated circuit device includes a processor and a plurality of configurable digital signal processors (DSPs). Each configurable DSP includes a circuit block, which in turn includes a plurality of multipliers. The accelerated processor structure further includes a first bus to transfer data from the processor to the configurable DSPs, and a second bus to transfer data from the configurable DSPs to the processor.
    Type: Application
    Filed: October 8, 2018
    Publication date: February 28, 2019
    Inventors: David Shippy, Martin Langhammer, Jeffrey Eastlack
  • Patent number: 10218386
    Abstract: A Reed-Solomon encoder that supports multiple code words is provided. The encoder circuit may include partial syndrome calculation circuitry, three matrix multiplication circuits, and two adder circuits. The partial syndrome calculation circuitry may receive a message and generate partial syndromes. The first matrix multiplication circuit may multiply a lower portion of the partial syndromes by a small Lagrange matrix to produce a small parity symbol vector. The second matrix multiplication circuit may multiply the small parity symbol vector by a Vandermonde matrix to produce a product vector. The first adder circuit may add the product vector to an upper portion of the partial syndromes to produce a sum vector. The third matrix multiplication circuit may multiply the sum vector by a large Lagrange matrix to produce a large product vector. The large product vector may be selectively combined with the small parity symbol vector to generate final parity check symbols.
    Type: Grant
    Filed: November 22, 2016
    Date of Patent: February 26, 2019
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Simon Finn, Sami Mumtaz