Patents by Inventor Martin Langhammer

Martin Langhammer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Prefix network-directed addition

Patent number: 11334318

Abstract: The present disclosure relates generally to techniques for enhancing adders implemented on an integrated circuit. In particular, arithmetic performed by an adder implemented to receive operands having a first precision is restructured so that a set of sub-adders performs the arithmetic on a respective segment of the operands. More specifically, the adder is restructured, and a decoder determines a generate signal and a propagate signal for each of the sub-adders and routes the generate signal and the propagate signal to a prefix network. The prefix network determines respective carry bit(s), which carries into and/or select a sum at a subsequent sub-adder.

Type: Grant

Filed: August 1, 2018

Date of Patent: May 17, 2022

Assignee: Intel Corporation

Inventors: Martin Langhammer, Bogdan Mihai Pasca, Sergey Vladimirovich Gribok
Systems And Methods For Processor Circuits

Publication number: 20220121593

Abstract: A processor circuit includes a first front-end circuit for scheduling first instructions for a first program and a second front-end circuit for scheduling second instructions for a second program. A back-end processing circuit processes first operations in the first instructions and second operations in the second instructions. A multi-program scheduler circuit causes the first front-end circuit to schedule processing of the first operations on the back-end processing circuit and causes the second front-end circuit to schedule processing of the second operations on the back-end processing circuit. A processor generator system includes a processor designer that creates specifications for a processor using workloads for a program, a processor generator that generates a first processor instance using the specifications, a processor optimizer that generates a second processor instance using the workloads, and a co-designer that modifies the program using the second processor instance.

Type: Application

Filed: December 23, 2021

Publication date: April 21, 2022

Applicant: Intel Corporation

Inventors: Eriko Nurvitadhi, Martin Langhammer, Andrew Boutros
Systems and Methods for Structured Mixed-Precision in a Specialized Processing Block

Publication number: 20220113940

Abstract: This disclosure is directed to a digital signal processing (DSP) block that includes multiple weight registers configurable to receive and store a first plurality of values having multiple precisions, and multiple multipliers that are each configurable to receive a respective value of the first plurality of values. The DSP block further includes one or more inputs configurable to receive a second plurality of values, and a multiplexer network configurable to receive the second plurality of values and route each respective value of the second plurality of values to a multiplier of the multipliers. The multipliers are configurable to simultaneously multiply each value of the first plurality of values by a respective value of the second plurality of values to generate a plurality of products. Additionally, the DSP block includes adder circuitry configurable to generate a first sum and a second sum based on the plurality of products.

Type: Application

Filed: December 22, 2021

Publication date: April 14, 2022

Inventors: Martin Langhammer, Michael Wu, Nihat Engin Tunali
CIRCUITRY FOR HIGH-BANDWIDTH, LOW-LATENCY MACHINE LEARNING

Publication number: 20220114236

Abstract: The present disclosure relates generally to techniques for efficiently performing operations associated with artificial intelligence (AI), machine learning (ML), and/or deep learning (DL) applications, such as training and/or interference calculations, using an integrated circuit device. More specifically, the present disclosure relates to an integrated circuit design implemented to perform these operations with low latency and/or a high bandwidth of data. For example, embodiments of a computationally dense digital signal processing (DSP) circuitry, implemented to efficiently perform one or more arithmetic operations (e.g., a dot-product) on an input are disclosed. Moreover, embodiments described herein may relate to layout, design, and data scheduling of a processing element array implemented to compute matrix multiplications (e.g., systolic array multiplication).

Type: Application

Filed: December 23, 2021

Publication date: April 14, 2022

Inventors: Martin Langhammer, Andrei-Mihai Hagiescu-Miriste
Deterministic clustering and packing method for random logic on programmable integrated circuits

Patent number: 11301611

Abstract: Methods and apparatus for increasing the random logic utilization on a programmable device are provided. Although not completely homogeneous, the programmable device has many components that are repeated many times in an array. To help improve repeatability and packing, computer-aided design tools for compiling a circuit design for the programmable device may first lock down a synthesis cell netlist with stable naming, create location solution files (files with desired clustering granularity for stabilizing performance and reducing compile times) for selected regions of interest on the programmable device, and compose a final design with only the best solutions some of which can be imported from one location to another. Compiling a design in this way can help improve random logic utilization beyond 85% while improving circuit performance by 20% or more.

Type: Grant

Filed: December 19, 2019

Date of Patent: April 12, 2022

Assignee: Intel Corporation

Inventors: Gregg William Baeckler, Martin Langhammer
Reduced latency multiplier circuitry for very large numbers

Patent number: 11301213

Abstract: An integrated circuit with a large multiplier is provided. The multiplier may be configured to receive large input operands with thousands of bits. The multiplier may be implemented using a multiplier decomposition scheme that is recursively flattened into multiple decomposition levels to expose a tree of adders. The adders may be collapsed into a merged pipelined structure, where partial sums are forwarded from one level to the next while bypassing intervening prefix networks. The final correct sum is not calculated until later. In accordance with the decomposition technique, the partial sums are successively halved, which allows the prefix networks to be smaller from one level to the next. This allows all sums to be calculated at approximately the same pipeline depth, which significantly reduces latency with no or limited pipeline balancing.

Type: Grant

Filed: June 24, 2019

Date of Patent: April 12, 2022

Assignee: Intel Corporation

Inventors: Martin Langhammer, Bogdan Pasca
MACHINE LEARNING TRAINING ARCHITECTURE FOR PROGRAMMABLE DEVICES

Publication number: 20220107783

Abstract: A programmable device may be configured to support machine learning training operations using matrix multiplication circuitry. In some embodiments, the multiplication is implemented on a systolic array. The systolic array includes an array of processing elements, each of which includes hybrid floating-point dot-product circuitry.

Type: Application

Filed: December 16, 2021

Publication date: April 7, 2022

Inventors: Martin Langhammer, Bogdan Pasca, Sergey Gribok, Gregg William Baeckler, Andrei Hagiescu
Floating-point dynamic range expansion

Patent number: 11294626

Abstract: The present disclosure relates generally to techniques for adjusting the number representation (e.g., format) of a variable before and/or after performing one or more arithmetic operations on the variable. In particular, the present disclosure relates to scaling the range of a variable to a suitable representation based on available hardware (e.g., hard logic) in an integrated circuit device. For example, an input in a first number format (e.g., bfloat16) may be scaled to a second number format (e.g., half-precision floating-point) so that circuitry implemented to receive inputs in the second number format may perform one or more arithmetic operations on the input. Further, the output produced by the circuitry may be scaled back to the first number format. Accordingly, arithmetic operations, such as a dot-product, performed in a first format may be emulated by scaling the inputs to and/or the outputs from arithmetic operations performed in another format.

Type: Grant

Filed: September 27, 2018

Date of Patent: April 5, 2022

Assignee: Intel Corporation

Inventors: Bogdan Mihai Pasca, Martin Langhammer
Circuitry for low-precision deep learning

Patent number: 11275998

Abstract: The present disclosure relates generally to techniques for improving the implementation of certain operations on an integrated circuit. In particular, deep learning techniques, which may use a deep neural network (DNN) topology, may be implemented more efficiently using low-precision weights and activation values by efficiently performing down conversion of data to a lower precision and by preventing data overflow during suitable computations. Further, by more efficiently mapping multipliers to programmable logic on the integrated circuit device, the resources used by the DNN topology to perform, for example, inference tasks may be reduced, resulting in improved integrated circuit operating speeds.

Type: Grant

Filed: May 31, 2018

Date of Patent: March 15, 2022

Assignee: Intel Corporation

Inventors: Martin Langhammer, Sudarshan Srinivasan, Gregg William Baeckler, Duncan Moss, Sasikanth Avancha, Dipankar Das
Hyperbolic functions for machine learning acceleration

Patent number: 11256978

Abstract: The present disclosure relates generally to techniques for enhancing recurrent neural networks (RNNs) implemented on an integrated circuit. In particular, approximations of activation functions used in an RNN, such as sigmoid and hyperbolic tangent, may be implemented in an integrated circuit, which may result in increased efficiencies, reduced latency, increased accuracy, and reduced resource consumption involved with implementing machine learning.

Type: Grant

Filed: January 5, 2018

Date of Patent: February 22, 2022

Assignee: Intel Corporation

Inventors: Bogdan Pasca, Martin Langhammer
Integrated circuits with modular multiplication circuitry

Patent number: 11249726

Abstract: An integrated circuit is provided with a modular multiplication circuit. The modular multiplication circuit includes an input multiplier for computing the product of two input signals, truncated multipliers for computing another product based on a modulus value and the product, and a subtraction circuit for computing a difference between the two products. An error correction circuit uses the difference to look up an estimated quotient value and to subtract out an integer multiple of the modulus value from the difference in a single step, wherein the integer multiple is equal to the estimated quotient value. A final adjustment stage is used to remove any remaining residual estimation error.

Type: Grant

Filed: September 10, 2019

Date of Patent: February 15, 2022

Assignee: Intel Corporation

Inventors: Martin Langhammer, Bogdan Pasca
Programmable Device Implementing Fixed and Floating Point Functionality in a Mixed Architecture

Publication number: 20220027128

Abstract: Configurable specialized processing blocks, such as DSP blocks, are described that implement fixed and floating-point functionality in a single mixed architecture on a programmable device. The described architecture reduces the need to construct floating-point functions outside the configurable specialized processing block, thereby minimizing hardware cost and area. The disclosed architecture also introduces pipelining into the DSP block in order to ensure the floating-point multiplication and addition functions remain in synchronicity, thereby increasing the maximum frequency at which the DSP block can operate. Moreover, the disclosed architecture includes logic circuitry to support floating-point exception handling.

Type: Application

Filed: October 4, 2021

Publication date: January 27, 2022

Inventors: Keone Streicher, Martin Langhammer, Yi-Wen Lin, Hyun Yi
INTEGRATED CIRCUITS WITH MACHINE LEARNING EXTENSIONS

Publication number: 20220012015

Abstract: An integrated circuit with specialized processing blocks are provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.

Type: Application

Filed: September 24, 2021

Publication date: January 13, 2022

Inventors: Martin Langhammer, Dongdong Chen, Kevin Hurd
Systems and Methods for Sparsity Operations in a Specialized Processing Block

Publication number: 20220012012

Abstract: This disclosure is directed to a digital signal processing (DSP) block that includes multiple weight registers configurable to receive and store a first plurality of values, and multiple multipliers that are each configurable to receive a respective value of the first plurality of values. The DSP block further includes one or more inputs configurable to receive a second plurality of values, and a multiplexer network configurable to receive the second plurality of values and route each respective value of the second plurality of values to a multiplier of the multipliers. The multipliers are configurable to simultaneously multiply each value of the first plurality of values by a respective value of the second plurality of values to generate a plurality of products. Additionally, the DSP block includes adder circuitry configurable to generate a first sum and a second sum based on the plurality of products.

Type: Application

Filed: September 24, 2021

Publication date: January 13, 2022

Inventors: Martin Langhammer, Michael Wu, Nihat Engin Tunali, Ilya Ganusov
Circuitry for high-bandwidth, low-latency machine learning

Patent number: 11216532

Abstract: The present disclosure relates generally to techniques for efficiently performing operations associated with artificial intelligence (AI), machine learning (ML), and/or deep learning (DL) applications, such as training and/or interference calculations, using an integrated circuit device. More specifically, the present disclosure relates to an integrated circuit design implemented to perform these operations with low latency and/or a high bandwidth of data. For example, embodiments of a computationally dense digital signal processing (DSP) circuitry, implemented to efficiently perform one or more arithmetic operations (e.g., a dot-product) on an input are disclosed. Moreover, embodiments described herein may relate to layout, design, and data scheduling of a processing element array implemented to compute matrix multiplications (e.g., systolic array multiplication).

Type: Grant

Filed: March 29, 2019

Date of Patent: January 4, 2022

Assignee: Intel Corporation

Inventors: Martin Langhammer, Andrei-Mihai Hagiescu-Miriste
Method and apparatus for performing field programmable gate array packing with continuous carry chains

Patent number: 11216249

Abstract: A method for designing a system on a target device includes identifying a length for a carry chain that is supported by predefined quanta of a resource on the target device. A plurality of logical adders is mapped onto a single logical adder implemented on the carry chain subject to the identified length to increase logic utilization in a design for the system.

Type: Grant

Filed: March 27, 2018

Date of Patent: January 4, 2022

Assignee: Intel Corporation

Inventors: Martin Langhammer, Gregg William Baeckler
Machine learning training architecture for programmable devices

Patent number: 11210063

Abstract: A programmable device may be configured to support machine learning training operations using matrix multiplication circuitry implemented on a systolic array. The systolic array includes an array of processing elements, each of which includes hybrid floating-point dot-product circuitry. The hybrid dot-product circuitry has a hard data path that uses digital signal processing (DSP) blocks operating in floating-point mode and a hard/soft data path that uses DSP blocks operating in fixed-point mode operated in conjunction with general purpose soft logic. The hard/soft data path includes 2-element dot-product circuits that feed an adder tree. Results from the hard data path are combined with the adder tree using format conversion and normalization circuitry. Inputs to the hybrid dot-product circuitry may be in the BFLOAT16 format. The hard data path may be in the single precision format. The hard/soft data path uses a custom format that is similar to but different than BFLOAT16.

Type: Grant

Filed: September 27, 2019

Date of Patent: December 28, 2021

Assignee: Intel Corporation

Inventors: Martin Langhammer, Bogdan Pasca, Sergey Gribok, Gregg William Baeckler, Andrei Hagiescu
AREA AND ENERGY EFFICIENT MULTI-PRECISION MULTIPLY-ACCUMULATE UNIT-BASED PROCESSOR

Publication number: 20210397414

Abstract: Systems, apparatuses and methods may provide for multi-precision multiply-accumulate (MAC) technology that includes a plurality of arithmetic blocks, wherein the plurality of arithmetic blocks each contain multiple multipliers, and wherein the logic is to combine multipliers one or more of within each arithmetic block or across multiple arithmetic blocks. In one example, one or more intermediate multipliers are of a size that is less than precisions supported by arithmetic blocks containing the one or more intermediate multipliers.

Type: Application

Filed: June 25, 2021

Publication date: December 23, 2021

Inventors: Arnab Raha, Mark A. Anders, Martin Power, Martin Langhammer, Himanshu Kaul, Debabrata Mohapatra, Gautham Chinya, Cormac Brick, Ram Krishnamurthy
Integrated circuits with machine learning extensions

Patent number: 11175892

Abstract: An integrated circuit with specialized processing blocks are provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.

Type: Grant

Filed: November 20, 2017

Date of Patent: November 16, 2021

Assignee: Intel Corporation

Inventors: Martin Langhammer, Dongdong Chen
Programmable-logic-directed multiplier mapping

Patent number: 11163530

Abstract: Multiplier circuitry includes first combinatorial circuitry configured to perform a combinatorial function, based at least in part on redundant form arithmetic, to generate a first subset of two or more partial products. The two or more partial products are based at least in part on a first input to the multiplier circuitry and a second input to the multiplier circuitry. The multiplier circuitry also includes a carry chain that includes a second combinatorial circuitry configured to generate a second subset of the two or more partial products based at least in part on the first input and the second input. Furthermore, the carry chain includes one or more binary ripple-carry adders configured to generate a product of the multiplier circuitry based at least in part on a sum of the two or more partial products.

Type: Grant

Filed: March 22, 2018

Date of Patent: November 2, 2021

Assignee: Intel Corporation

Inventors: Martin Langhammer, Gregg William Baeckler

prev 1 2 3 4 5 6 7 … next