Patents by Inventor Martin Langhammer
Martin Langhammer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240126558Abstract: A processor includes a shared memory, and an instruction unit to receive a single instruction, multiple thread (SIMT) instruction having a first source register identifier and a second source register identifier. The SIMT instruction indicates a number of data values to be written to the shared memory concurrently. A SIMT processor includes processor elements each to execute instructions of a different corresponding thread of a parallel thread group. Each of a number of processor elements, equal in number to the number of data values, is to execute the SIMT instruction to concurrently write a different corresponding one of the number of data values from a first source register of the respective processor element identified by the first source register identifier to the shared memory at an address based on address information from a second source register of the respective processor element identified by the second source register identifier.Type: ApplicationFiled: December 20, 2023Publication date: April 18, 2024Inventor: Martin LANGHAMMER
-
Patent number: 11960853Abstract: Folded integer multiplier (FIM) circuitry includes a multiplier configurable to perform multiplication and a first addition/subtraction unit and a second addition/subtraction unit both configurable to perform addition and subtraction. The FIM circuitry is configurable to determine each product of a plurality of products for a plurality of pairs of input values having a first number of bits by performing, using the first and second addition/subtraction units, a plurality of operations involving addition or subtraction, and performing, using the multiplier, a plurality of multiplication operations involving values having fewer bits than the first number of bits. The plurality of multiplication operations includes a first number of multiplication operations, and the multiplier is configurable to begin performing all multiplication operations of the plurality of multiplication operations within a first number of clock cycles equal to the first number of multiplication operations.Type: GrantFiled: March 26, 2021Date of Patent: April 16, 2024Assignee: Intel CorporationInventors: Martin Langhammer, Bogdan Mihai Pasca
-
Publication number: 20240118870Abstract: Integrated circuit devices, methods, and circuitry for a digital signal processing (DSP) block that can selectively perform higher-precision DSP multiplication operations or lower-precision AI tensor multiplication operations. Flexible digital signal processing circuitry may include hardened multipliers, hardened summation circuitry, and an intermediate multiplexer network. The intermediate multiplexer network may be configurable to, in a first configuration, route data between the plurality of hardened multipliers and the hardened summation circuitry to perform a plurality of lower-precision multiplication operations. In a second configuration, the intermediate multiplexer network may route the data between the plurality of hardened multipliers and the hardened summation circuitry to perform at least one higher-precision multiplication operation.Type: ApplicationFiled: September 27, 2023Publication date: April 11, 2024Inventor: Martin Langhammer
-
Publication number: 20240113699Abstract: Integrated circuit devices, methods, and circuitry for implementing and using a flexible circuit for real and complex filter operations are provided. An integrated circuit may include programmable logic circuitry and digital signal processor (DSP) blocks. The DSP blocks may be configurable to receive inputs from the programmable logic circuitry and may include first and second multiplier pairs. The first multiplier pair may include a first multiplier that may receive a first input and a second input and a second multiplier that may receive the second input and a third input of the inputs. The second multiplier pair may include a third multiplier that may receive the first input or a fourth input and a fifth input and a fourth multiplier that may receive the third input or a fifth input and a sixth input.Type: ApplicationFiled: September 30, 2022Publication date: April 4, 2024Inventor: Martin Langhammer
-
Publication number: 20240078211Abstract: An accelerated processor structure on a programmable integrated circuit device includes a processor and a plurality of configurable digital signal processors (DSPs). Each configurable DSP includes a circuit block, which in turn includes a plurality of multipliers. The accelerated processor structure further includes a first bus to transfer data from the processor to the configurable DSPs, and a second bus to transfer data from the configurable DSPs to the processor.Type: ApplicationFiled: September 14, 2023Publication date: March 7, 2024Inventors: David Shippy, Martin Langhammer, Jeffrey Eastlack
-
Patent number: 11907719Abstract: The present disclosure describes a digital signal processing (DSP) block that includes a plurality of columns of weight registers and a plurality of inputs configured to receive a first plurality of values and a second plurality of values. The first plurality of values is stored in the plurality of columns of weight registers after being received. Additionally, the DSP block includes a plurality of multipliers configured to simultaneously multiply each value of the first plurality of values by each value of the second plurality of values.Type: GrantFiled: June 26, 2020Date of Patent: February 20, 2024Assignee: Intel CorporationInventors: Martin Langhammer, Dongdong Chen, Jason R. Bergendahl
-
Patent number: 11899746Abstract: The present disclosure relates generally to techniques for efficiently performing operations associated with artificial intelligence (AI), machine learning (ML), and/or deep learning (DL) applications, such as training and/or interference calculations, using an integrated circuit device. More specifically, the present disclosure relates to an integrated circuit design implemented to perform these operations with low latency and/or a high bandwidth of data. For example, embodiments of a computationally dense digital signal processing (DSP) circuitry, implemented to efficiently perform one or more arithmetic operations (e.g., a dot-product) on an input are disclosed. Moreover, embodiments described herein may relate to layout, design, and data scheduling of a processing element array implemented to compute matrix multiplications (e.g., systolic array multiplication).Type: GrantFiled: December 23, 2021Date of Patent: February 13, 2024Assignee: Intel CorporationInventors: Martin Langhammer, Andrei-Mihai Hagiescu-Miriste
-
Publication number: 20230367547Abstract: A first storage location is to store a first floating-point data element. The first data element has a sign bit, an N-bit first exponent value, and M bits. A second storage location is to store a second floating-point data element that is to have a same number of bits as the first floating-point data element. The second data element has a sign bit, an N-bit first exponent value, and M bits. The N-bit first exponent value of the second data element is all zeroes and the M bits of the second data element include a significand and a second exponent value. A floating-point arithmetic unit is coupled with the first and second storage locations. The floating-point arithmetic unit is to perform either multiplication or addition on the first and second data elements to generate a result data element based at least in part on the second exponent value.Type: ApplicationFiled: June 15, 2023Publication date: November 16, 2023Inventor: Martin LANGHAMMER
-
Publication number: 20230368030Abstract: Weights can be pruned during DNN training to increase sparsity in the weights and reduce the amount of computation required for performing the deep learning operations in DNNs. A DNN layer may have one or more weight tensors corresponding to one or more output channels of the layer. A weight tensor has weights, the values of which are determined by training the DNN. A weight tensor may have a dimension corresponding to the input channels of the layer. The weight tensor may be partitioned into subtensors, each of which has a subset of the input channels. The subtensor may have the same number of input channels. One or more subtensors may be selected, e.g., based on the weights in the one or more subtensors. The weights in a selected subtensor are pruned, e.g., changed to zeros. The weights in an unselected subtensor may be modified by further training the DNN.Type: ApplicationFiled: July 25, 2023Publication date: November 16, 2023Inventors: Arnab Raha, Michael Wu, Deepak Abraham Mathaikutty, Martin Langhammer, Nihat Tunali
-
Patent number: 11809798Abstract: The present disclosure describes an integrated circuit device that includes a digital signal processing (DSP) block. The DSP block that includes a plurality of columns of weight registers and a plurality of inputs configured to receive a first plurality of values and a second plurality of values. The first plurality of values is stored in the plurality of columns of weight registers after being received. Also, the first plurality of inputs, the second plurality of inputs, or both are derived from higher precision values. Additionally, the DSP block includes a plurality of multipliers configured to simultaneously multiply each value of the first plurality of values by each value of the second plurality of values.Type: GrantFiled: June 26, 2020Date of Patent: November 7, 2023Assignee: Intel CorporationInventors: Martin Langhammer, Simon Peter Finn
-
Publication number: 20230342111Abstract: An integrated circuit with specialized processing blocks is provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.Type: ApplicationFiled: June 30, 2023Publication date: October 26, 2023Inventors: Martin Langhammer, Dongdong Chen, Kevin Hurd
-
Patent number: 11797473Abstract: An accelerated processor structure on a programmable integrated circuit device includes a processor and a plurality of configurable digital signal processors (DSPs). Each configurable DSP includes a circuit block, which in turn includes a plurality of multipliers. The accelerated processor structure further includes a first bus to transfer data from the processor to the configurable DSPs, and a second bus to transfer data from the configurable DSPs to the processor.Type: GrantFiled: October 8, 2018Date of Patent: October 24, 2023Assignee: Altera CorporationInventors: David Shippy, Martin Langhammer, Jeffrey Eastlack
-
Publication number: 20230333857Abstract: A processor of an aspect includes an instruction unit to receive a single instruction, multiple thread (SIMT) instruction. The SIMT instruction has at least one field to provide at least one value. The at least one value is to indicate a plurality of threads that are to execute the SIMT instruction. The processor also includes a SIMT processor coupled with the instruction unit. The SIMT processor is to execute the SIMT instruction for each of the plurality of threads. Other processors, methods, systems, and machine-readable medium storing such a SIMT instructions are also disclosed.Type: ApplicationFiled: June 22, 2023Publication date: October 19, 2023Inventor: Martin LANGHAMMER
-
Patent number: 11789641Abstract: A three dimensional circuit system includes a first integrated circuit die having a core logic region that has first memory circuits and logic circuits. The three dimensional circuit system includes a second integrated circuit die that has second memory circuits. The first and second integrated circuit dies are coupled together in a vertically stacked configuration. The three dimensional circuit system includes third memory circuits coupled to the first integrated circuit die. The third memory circuits reside in a plane of the first integrated circuit die. The logic circuits are coupled to access the first, second, and third memory circuits and data can move between the first, second, and third memories. The third memory circuits have a larger memory capacity and a smaller memory access bandwidth than the second memory circuits. The second memory circuits have a larger memory capacity and a smaller memory access bandwidth than the first memory circuits.Type: GrantFiled: June 16, 2021Date of Patent: October 17, 2023Assignee: Intel CorporationInventors: Scott Weber, Jawad Khan, Ilya Ganusov, Martin Langhammer, Matthew Adiletta, Terence Magee, Albert Fazio, Richard Coulson, Ravi Gutala, Aravind Dasu, Mahesh Iyer
-
Publication number: 20230325665Abstract: Gate switching in deep learning operations can be reduced based on sparsity in the input data. A first element of an activation operand and a first element of a weight operand may be stored in input storage units associated with a multiplier in a processing element. The multiplier computes a product of the two elements, which may be stored in an output storage unit of the multiplier. After detecting that a second element of the activation operand or a second element of the weight operand is zero valued, gate switching is reduced by avoiding at least one gate switching needed for the multiply-accumulation operation. For instance, the input storage units may not be updated. A zero-valued data element may be stored in the output storage unit of the multiplier and used as a product of the second element of the activation operand and the second element of the weight operand.Type: ApplicationFiled: May 30, 2023Publication date: October 12, 2023Applicant: Intel CorporationInventors: Martin Langhammer, Arnab Raha, Martin Power
-
Publication number: 20230273770Abstract: Integrated circuit devices, methods, and circuitry for implementing and using an iterative multiplicative modular reduction circuit are provided. Such circuitry may include polynomial multiplication circuitry and modular reduction circuitry that may operate concurrently. The polynomial multiplication circuitry may multiply a first input value to a second input value to compute a product. The modular reduction circuitry may perform modular reduction on a first component of the product while the polynomial multiplication circuitry is still generating other components of the product.Type: ApplicationFiled: March 16, 2023Publication date: August 31, 2023Inventors: Sergey Vladimirovich Gribok, Martin Langhammer, Bogdan Pasca
-
Patent number: 11726744Abstract: An integrated circuit with specialized processing blocks is provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.Type: GrantFiled: March 26, 2021Date of Patent: August 15, 2023Assignee: Intel CorporationInventors: Martin Langhammer, Dongdong Chen, Kevin Hurd
-
Publication number: 20230239136Abstract: Integrated circuits, methods, and circuitry are provided for performing multiplication such as that used in Galois field counter mode (GCM) hash computations. An integrated circuit may include selection circuitry to provide one of several powers of a hash key. A Galois field multiplier may receive the one of the powers of the hash key and a hash sequence and generate one or more values. The Galois field multiplier may include multiple levels of pipeline stages. An adder may receive the one or more values and provide a summation of the one or more values in computing a GCM hash.Type: ApplicationFiled: March 31, 2023Publication date: July 27, 2023Inventors: Sergey Vladimirovich Gribok, Gregg William Baeckler, Bogdan Pasca, Martin Langhammer
-
Publication number: 20230229917Abstract: A compute block can perform hybrid multiply-accumulate (MAC) operations. The compute block may include a weight compressing module and a processing element (PE) array. The weight compression module may select a first group of one or more weights and a second group of one or more weights from a weight tensor of a DNN (deep neural network) layer. A weight in the first group is quantized to a power of two value. A weight in the second group is quantized to an integer. The integer and the exponent of the power of two value may be stored in a memory in lieu of the original values of the weights. A PE in the PE array includes a shifter configured to shift an activation of the layer by the exponent of the power of two value and a multiplier configured to multiplying the integer with another activation of the layer.Type: ApplicationFiled: March 15, 2023Publication date: July 20, 2023Applicant: Intel CorporationInventors: Michael Wu, Arnab Raha, Deepak Abraham Mathaikutty, Nihat Tunali, Martin Langhammer
-
Publication number: 20230222275Abstract: A method is provided for processing code for a circuit design for an integrated circuit using a computer system. The method includes receiving at least a portion of the code for the circuit design for the integrated circuit, wherein the portion of the code comprises an error or has incomplete constraints, making an assumption about the error and the missing constraints using a computer aid design tool, and generating a revised circuit design for the integrated circuit with the error corrected and any missing constraints added based on the assumption and based on the code using the computer aided design tool and a library of components for circuit designs.Type: ApplicationFiled: March 16, 2023Publication date: July 13, 2023Applicant: Intel CorporationInventors: Gregg Baeckler, Mahesh A. Iyer, Martin Langhammer