Patents by Inventor Martin Langhammer

Martin Langhammer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11334318
    Abstract: The present disclosure relates generally to techniques for enhancing adders implemented on an integrated circuit. In particular, arithmetic performed by an adder implemented to receive operands having a first precision is restructured so that a set of sub-adders performs the arithmetic on a respective segment of the operands. More specifically, the adder is restructured, and a decoder determines a generate signal and a propagate signal for each of the sub-adders and routes the generate signal and the propagate signal to a prefix network. The prefix network determines respective carry bit(s), which carries into and/or select a sum at a subsequent sub-adder.
    Type: Grant
    Filed: August 1, 2018
    Date of Patent: May 17, 2022
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Bogdan Mihai Pasca, Sergey Vladimirovich Gribok
  • Publication number: 20220121593
    Abstract: A processor circuit includes a first front-end circuit for scheduling first instructions for a first program and a second front-end circuit for scheduling second instructions for a second program. A back-end processing circuit processes first operations in the first instructions and second operations in the second instructions. A multi-program scheduler circuit causes the first front-end circuit to schedule processing of the first operations on the back-end processing circuit and causes the second front-end circuit to schedule processing of the second operations on the back-end processing circuit. A processor generator system includes a processor designer that creates specifications for a processor using workloads for a program, a processor generator that generates a first processor instance using the specifications, a processor optimizer that generates a second processor instance using the workloads, and a co-designer that modifies the program using the second processor instance.
    Type: Application
    Filed: December 23, 2021
    Publication date: April 21, 2022
    Applicant: Intel Corporation
    Inventors: Eriko Nurvitadhi, Martin Langhammer, Andrew Boutros
  • Publication number: 20220113940
    Abstract: This disclosure is directed to a digital signal processing (DSP) block that includes multiple weight registers configurable to receive and store a first plurality of values having multiple precisions, and multiple multipliers that are each configurable to receive a respective value of the first plurality of values. The DSP block further includes one or more inputs configurable to receive a second plurality of values, and a multiplexer network configurable to receive the second plurality of values and route each respective value of the second plurality of values to a multiplier of the multipliers. The multipliers are configurable to simultaneously multiply each value of the first plurality of values by a respective value of the second plurality of values to generate a plurality of products. Additionally, the DSP block includes adder circuitry configurable to generate a first sum and a second sum based on the plurality of products.
    Type: Application
    Filed: December 22, 2021
    Publication date: April 14, 2022
    Inventors: Martin Langhammer, Michael Wu, Nihat Engin Tunali
  • Publication number: 20220114236
    Abstract: The present disclosure relates generally to techniques for efficiently performing operations associated with artificial intelligence (AI), machine learning (ML), and/or deep learning (DL) applications, such as training and/or interference calculations, using an integrated circuit device. More specifically, the present disclosure relates to an integrated circuit design implemented to perform these operations with low latency and/or a high bandwidth of data. For example, embodiments of a computationally dense digital signal processing (DSP) circuitry, implemented to efficiently perform one or more arithmetic operations (e.g., a dot-product) on an input are disclosed. Moreover, embodiments described herein may relate to layout, design, and data scheduling of a processing element array implemented to compute matrix multiplications (e.g., systolic array multiplication).
    Type: Application
    Filed: December 23, 2021
    Publication date: April 14, 2022
    Inventors: Martin Langhammer, Andrei-Mihai Hagiescu-Miriste
  • Patent number: 11301611
    Abstract: Methods and apparatus for increasing the random logic utilization on a programmable device are provided. Although not completely homogeneous, the programmable device has many components that are repeated many times in an array. To help improve repeatability and packing, computer-aided design tools for compiling a circuit design for the programmable device may first lock down a synthesis cell netlist with stable naming, create location solution files (files with desired clustering granularity for stabilizing performance and reducing compile times) for selected regions of interest on the programmable device, and compose a final design with only the best solutions some of which can be imported from one location to another. Compiling a design in this way can help improve random logic utilization beyond 85% while improving circuit performance by 20% or more.
    Type: Grant
    Filed: December 19, 2019
    Date of Patent: April 12, 2022
    Assignee: Intel Corporation
    Inventors: Gregg William Baeckler, Martin Langhammer
  • Patent number: 11301213
    Abstract: An integrated circuit with a large multiplier is provided. The multiplier may be configured to receive large input operands with thousands of bits. The multiplier may be implemented using a multiplier decomposition scheme that is recursively flattened into multiple decomposition levels to expose a tree of adders. The adders may be collapsed into a merged pipelined structure, where partial sums are forwarded from one level to the next while bypassing intervening prefix networks. The final correct sum is not calculated until later. In accordance with the decomposition technique, the partial sums are successively halved, which allows the prefix networks to be smaller from one level to the next. This allows all sums to be calculated at approximately the same pipeline depth, which significantly reduces latency with no or limited pipeline balancing.
    Type: Grant
    Filed: June 24, 2019
    Date of Patent: April 12, 2022
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Bogdan Pasca
  • Publication number: 20220107783
    Abstract: A programmable device may be configured to support machine learning training operations using matrix multiplication circuitry. In some embodiments, the multiplication is implemented on a systolic array. The systolic array includes an array of processing elements, each of which includes hybrid floating-point dot-product circuitry.
    Type: Application
    Filed: December 16, 2021
    Publication date: April 7, 2022
    Inventors: Martin Langhammer, Bogdan Pasca, Sergey Gribok, Gregg William Baeckler, Andrei Hagiescu
  • Patent number: 11294626
    Abstract: The present disclosure relates generally to techniques for adjusting the number representation (e.g., format) of a variable before and/or after performing one or more arithmetic operations on the variable. In particular, the present disclosure relates to scaling the range of a variable to a suitable representation based on available hardware (e.g., hard logic) in an integrated circuit device. For example, an input in a first number format (e.g., bfloat16) may be scaled to a second number format (e.g., half-precision floating-point) so that circuitry implemented to receive inputs in the second number format may perform one or more arithmetic operations on the input. Further, the output produced by the circuitry may be scaled back to the first number format. Accordingly, arithmetic operations, such as a dot-product, performed in a first format may be emulated by scaling the inputs to and/or the outputs from arithmetic operations performed in another format.
    Type: Grant
    Filed: September 27, 2018
    Date of Patent: April 5, 2022
    Assignee: Intel Corporation
    Inventors: Bogdan Mihai Pasca, Martin Langhammer
  • Patent number: 11275998
    Abstract: The present disclosure relates generally to techniques for improving the implementation of certain operations on an integrated circuit. In particular, deep learning techniques, which may use a deep neural network (DNN) topology, may be implemented more efficiently using low-precision weights and activation values by efficiently performing down conversion of data to a lower precision and by preventing data overflow during suitable computations. Further, by more efficiently mapping multipliers to programmable logic on the integrated circuit device, the resources used by the DNN topology to perform, for example, inference tasks may be reduced, resulting in improved integrated circuit operating speeds.
    Type: Grant
    Filed: May 31, 2018
    Date of Patent: March 15, 2022
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Sudarshan Srinivasan, Gregg William Baeckler, Duncan Moss, Sasikanth Avancha, Dipankar Das
  • Patent number: 11256978
    Abstract: The present disclosure relates generally to techniques for enhancing recurrent neural networks (RNNs) implemented on an integrated circuit. In particular, approximations of activation functions used in an RNN, such as sigmoid and hyperbolic tangent, may be implemented in an integrated circuit, which may result in increased efficiencies, reduced latency, increased accuracy, and reduced resource consumption involved with implementing machine learning.
    Type: Grant
    Filed: January 5, 2018
    Date of Patent: February 22, 2022
    Assignee: Intel Corporation
    Inventors: Bogdan Pasca, Martin Langhammer
  • Patent number: 11249726
    Abstract: An integrated circuit is provided with a modular multiplication circuit. The modular multiplication circuit includes an input multiplier for computing the product of two input signals, truncated multipliers for computing another product based on a modulus value and the product, and a subtraction circuit for computing a difference between the two products. An error correction circuit uses the difference to look up an estimated quotient value and to subtract out an integer multiple of the modulus value from the difference in a single step, wherein the integer multiple is equal to the estimated quotient value. A final adjustment stage is used to remove any remaining residual estimation error.
    Type: Grant
    Filed: September 10, 2019
    Date of Patent: February 15, 2022
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Bogdan Pasca
  • Publication number: 20220027128
    Abstract: Configurable specialized processing blocks, such as DSP blocks, are described that implement fixed and floating-point functionality in a single mixed architecture on a programmable device. The described architecture reduces the need to construct floating-point functions outside the configurable specialized processing block, thereby minimizing hardware cost and area. The disclosed architecture also introduces pipelining into the DSP block in order to ensure the floating-point multiplication and addition functions remain in synchronicity, thereby increasing the maximum frequency at which the DSP block can operate. Moreover, the disclosed architecture includes logic circuitry to support floating-point exception handling.
    Type: Application
    Filed: October 4, 2021
    Publication date: January 27, 2022
    Inventors: Keone Streicher, Martin Langhammer, Yi-Wen Lin, Hyun Yi
  • Publication number: 20220012015
    Abstract: An integrated circuit with specialized processing blocks are provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.
    Type: Application
    Filed: September 24, 2021
    Publication date: January 13, 2022
    Inventors: Martin Langhammer, Dongdong Chen, Kevin Hurd
  • Publication number: 20220012012
    Abstract: This disclosure is directed to a digital signal processing (DSP) block that includes multiple weight registers configurable to receive and store a first plurality of values, and multiple multipliers that are each configurable to receive a respective value of the first plurality of values. The DSP block further includes one or more inputs configurable to receive a second plurality of values, and a multiplexer network configurable to receive the second plurality of values and route each respective value of the second plurality of values to a multiplier of the multipliers. The multipliers are configurable to simultaneously multiply each value of the first plurality of values by a respective value of the second plurality of values to generate a plurality of products. Additionally, the DSP block includes adder circuitry configurable to generate a first sum and a second sum based on the plurality of products.
    Type: Application
    Filed: September 24, 2021
    Publication date: January 13, 2022
    Inventors: Martin Langhammer, Michael Wu, Nihat Engin Tunali, Ilya Ganusov
  • Patent number: 11216532
    Abstract: The present disclosure relates generally to techniques for efficiently performing operations associated with artificial intelligence (AI), machine learning (ML), and/or deep learning (DL) applications, such as training and/or interference calculations, using an integrated circuit device. More specifically, the present disclosure relates to an integrated circuit design implemented to perform these operations with low latency and/or a high bandwidth of data. For example, embodiments of a computationally dense digital signal processing (DSP) circuitry, implemented to efficiently perform one or more arithmetic operations (e.g., a dot-product) on an input are disclosed. Moreover, embodiments described herein may relate to layout, design, and data scheduling of a processing element array implemented to compute matrix multiplications (e.g., systolic array multiplication).
    Type: Grant
    Filed: March 29, 2019
    Date of Patent: January 4, 2022
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Andrei-Mihai Hagiescu-Miriste
  • Patent number: 11216249
    Abstract: A method for designing a system on a target device includes identifying a length for a carry chain that is supported by predefined quanta of a resource on the target device. A plurality of logical adders is mapped onto a single logical adder implemented on the carry chain subject to the identified length to increase logic utilization in a design for the system.
    Type: Grant
    Filed: March 27, 2018
    Date of Patent: January 4, 2022
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Gregg William Baeckler
  • Patent number: 11210063
    Abstract: A programmable device may be configured to support machine learning training operations using matrix multiplication circuitry implemented on a systolic array. The systolic array includes an array of processing elements, each of which includes hybrid floating-point dot-product circuitry. The hybrid dot-product circuitry has a hard data path that uses digital signal processing (DSP) blocks operating in floating-point mode and a hard/soft data path that uses DSP blocks operating in fixed-point mode operated in conjunction with general purpose soft logic. The hard/soft data path includes 2-element dot-product circuits that feed an adder tree. Results from the hard data path are combined with the adder tree using format conversion and normalization circuitry. Inputs to the hybrid dot-product circuitry may be in the BFLOAT16 format. The hard data path may be in the single precision format. The hard/soft data path uses a custom format that is similar to but different than BFLOAT16.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: December 28, 2021
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Bogdan Pasca, Sergey Gribok, Gregg William Baeckler, Andrei Hagiescu
  • Publication number: 20210397414
    Abstract: Systems, apparatuses and methods may provide for multi-precision multiply-accumulate (MAC) technology that includes a plurality of arithmetic blocks, wherein the plurality of arithmetic blocks each contain multiple multipliers, and wherein the logic is to combine multipliers one or more of within each arithmetic block or across multiple arithmetic blocks. In one example, one or more intermediate multipliers are of a size that is less than precisions supported by arithmetic blocks containing the one or more intermediate multipliers.
    Type: Application
    Filed: June 25, 2021
    Publication date: December 23, 2021
    Inventors: Arnab Raha, Mark A. Anders, Martin Power, Martin Langhammer, Himanshu Kaul, Debabrata Mohapatra, Gautham Chinya, Cormac Brick, Ram Krishnamurthy
  • Patent number: 11175892
    Abstract: An integrated circuit with specialized processing blocks are provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.
    Type: Grant
    Filed: November 20, 2017
    Date of Patent: November 16, 2021
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Dongdong Chen
  • Patent number: 11163530
    Abstract: Multiplier circuitry includes first combinatorial circuitry configured to perform a combinatorial function, based at least in part on redundant form arithmetic, to generate a first subset of two or more partial products. The two or more partial products are based at least in part on a first input to the multiplier circuitry and a second input to the multiplier circuitry. The multiplier circuitry also includes a carry chain that includes a second combinatorial circuitry configured to generate a second subset of the two or more partial products based at least in part on the first input and the second input. Furthermore, the carry chain includes one or more binary ripple-carry adders configured to generate a product of the multiplier circuitry based at least in part on a sum of the two or more partial products.
    Type: Grant
    Filed: March 22, 2018
    Date of Patent: November 2, 2021
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Gregg William Baeckler