Multiplication Followed By Addition (i.e., X*y+z) Patents (Class 708/523)
  • Patent number: 10528346
    Abstract: Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.
    Type: Grant
    Filed: March 29, 2018
    Date of Patent: January 7, 2020
    Assignee: Intel Corporation
    Inventors: Dipankar Das, Naveen K. Mellempudi, Mrinmay Dutta, Arun Kumar, Dheevatsa Mudigere, Abhisek Kundu
  • Patent number: 10372417
    Abstract: Disclosed herein is a computer implemented method for performing multiply-add operations of binary numbers P, Q, R, S, B in an arithmetic unit of a processor, the operation calculating a result as an accumulated sum, which equals to B+n×P×Q+m×R×S, where n and m are natural numbers. Further disclosed herein is an arithmetic unit configured to implement multiply-add operations of binary numbers P, Q, R, S, B comprising at least a first binary arithmetic unit for calculating an aligned high part result and a second binary arithmetic unit for calculating an aligned low part result of the multiply-add operations.
    Type: Grant
    Filed: July 13, 2017
    Date of Patent: August 6, 2019
    Assignee: International Business Machines Corporation
    Inventors: Tina Babinsky, Michael Klein, Cedric Lichtenau, Silvia M. Mueller
  • Patent number: 10365860
    Abstract: A circuit that includes a plurality of array cores, each array core of the plurality of array cores comprising: a plurality of distinct data processing circuits; and a data queue register file; a plurality of border cores, each border core of the plurality of border cores comprising: at least a register file, wherein: [i] at least a subset of the plurality of border cores encompasses a periphery of a first subset of the plurality of array cores; and [ii] a combination of the plurality of array cores and the plurality of border cores define an integrated circuit array.
    Type: Grant
    Filed: March 1, 2019
    Date of Patent: July 30, 2019
    Assignee: quadric.io, Inc.
    Inventors: Nigel Drego, Aman Sikka, Mrinalini Ravichandran, Ananth Durbha, Robert Daniel Firu, Veerbhan Kheterpal
  • Patent number: 10338925
    Abstract: Tensor register files in a hardware accelerator are disclosed. An apparatus may comprise tensor operation calculators each configured to perform a type of tensor operation. The apparatus may also comprises tensor register files, each of which is associated with one of the tensor operation calculators. The apparatus may also comprises logic configured to store respective ones of the tensors in the plurality of tensor register files in accordance with the type of tensor operation to be performed on the respective tensors. The apparatus may also control read access to tensor register files based on a type of tensor operation that a machine instruction is to perform.
    Type: Grant
    Filed: May 24, 2017
    Date of Patent: July 2, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jeremy Halden Fowers, Steven Karl Reinhardt, Kalin Ovtcharov, Eric Sen Chung
  • Patent number: 10261796
    Abstract: A processor and a method for executing an instruction on a processor are provided. In the method, a to-be-executed instruction is fetched, the instruction including a source address field, a destination address field, an operation type field, and an operation parameter field; in at least one execution unit, an execution unit controlled by a to-be-generated control signal according to the operation type field is determined, a source address and a destination address of data operated by the execution unit are determined according to the source address field and the destination address field, and a data amount of the data operated by the execution unit controlled by the to-be-generated control signal is determined according to the operation parameter field; the control signal is generated; and the execution unit in the at least one execution unit is controlled by using the control signal.
    Type: Grant
    Filed: November 23, 2016
    Date of Patent: April 16, 2019
    Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD
    Inventors: Jian Ouyang, Wei Qi, Yong Wang
  • Patent number: 10198263
    Abstract: Apparatus and methods are disclosed for nullifying one or more registers identified in a target field of a nullification instruction. In some examples of the disclosed technology, an apparatus can include memory and one or more block-based processor cores configured to fetch and execute a plurality of instruction blocks. One of the cores can include a control unit configured, based at least in part on receiving a nullification instruction, to obtain a register identification of at least one of a plurality of registers, based on a target field of the nullification instruction. A write to the at least one register associated with the register identification is nullified. The nullification instruction is in a first instruction block of the plurality of instruction blocks. Based on the nullified write to the at least one register, a subsequent instruction is executed from a second, different instruction block.
    Type: Grant
    Filed: March 3, 2016
    Date of Patent: February 5, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Douglas C. Burger, Aaron L. Smith
  • Patent number: 10169297
    Abstract: In one example in accordance with the present disclosure a resistive memory array is described. The array includes a number of resistive memory elements to receive a common-valued read signal. The array also includes a number of multiplication engines to perform a multiply operation by receiving a memory element output from a corresponding resistive memory element, receiving an input signal, and generating a multiplication output based on a received memory element output and a received input signal. The array also includes an accumulation engine to sum multiplication outputs from the number of multiplication engines.
    Type: Grant
    Filed: April 16, 2015
    Date of Patent: January 1, 2019
    Assignee: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
    Inventor: Brent Buchanan
  • Patent number: 10152456
    Abstract: A correlation operation circuit includes a first SRAM storing a plurality of pieces of detection pattern data, product-sum operators, a second SRAM storing intermediate data, and a comparator. When time series data is sequentially input, the intermediate data of all correlation functions referring to one time series data in a period during which the one time series data is input. When one time series data is input, the product-sum operator multiplies the detection pattern data sequentially read from the first SRAM by the one input time series data. The corresponding intermediate data is read from the second SRAM in synchronization with the multiplication, and the sequentially-calculated products are cumulatively added to the read intermediate data to be written back into the second SRAM as the intermediate data. As a result, the calculated correlation function data is supplied to the comparator to be compared with a predetermined specified value.
    Type: Grant
    Filed: May 1, 2017
    Date of Patent: December 11, 2018
    Assignee: Renesas Electronics Corporation
    Inventor: Hiroshi Ueki
  • Patent number: 10146248
    Abstract: A model calculation unit for calculating a data-based function model in a control unit is provided, the model calculation unit having a processor core which includes: a multiplication unit for carrying out a multiplication on the hardware side; an addition unit for carrying out an addition on the hardware side; an exponential function unit for calculating an exponential function on the hardware side; a memory in the form of a configuration register for storing hyperparameters and node data of the data-based function model to be calculated; and a logic circuit for controlling, on the hardware side, the calculation sequence in the multiplication unit, the addition unit, the exponential function unit and the memory in order to ascertain the data-based function model.
    Type: Grant
    Filed: April 7, 2014
    Date of Patent: December 4, 2018
    Assignee: ROBERT BOSCH GMBH
    Inventors: Tobias Lang, Heiner Markert, Axel Aue, Wolfgang Fischer, Ulrich Schulmeister, Nico Bannow, Felix Streichert, Andre Guntoro, Christian Fleck, Anne Von Vietinghoff, Michael Saetzler, Michael Hanselmann, Matthias Schreiber
  • Patent number: 10140090
    Abstract: Methods, systems and computer program products for computing and summing up multiple products in a single multiplier are provided. Aspects include receiving a first number and a second number, creating partial products of the first number and the second number based on a multiplication of the first number and the second number, and reducing the number of partial products to create an intermediate result. Aspects also include receiving a third number and a fourth number, creating partial products of the third number and the fourth number based on a multiplication of the third number and the fourth number, creating a reduction tree and adding the intermediate result to the reduction tree. Aspects further include reducing the number of partial products in the reduction tree to create a second sum value and a second carry value and adding the second sum value and the second carry value to create a result.
    Type: Grant
    Filed: September 28, 2016
    Date of Patent: November 27, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael Klein, Manuela Niekisch
  • Patent number: 10140251
    Abstract: A processor and a method for executing a matrix multiplication operation on a processor. A specific implementation of the processor includes a data bus and an array processor having k processing units. The data bus is configured to sequentially read n columns of row vectors from an M×N multiplicand matrix and input same to each processing unit in the array processor, read an n×k submatrix from an N×K multiplier matrix and input each column vector of the submatrix to a corresponding processing unit in the array processor, and output a result obtained by each processing unit after executing a multiplication operation. Each processing unit in the array processor is configured to execute in parallel a vector multiplication operation on the input row and column vectors. Each processing unit includes a Wallace tree multiplier having n multipliers and n?1 adders. This implementation improves the processing efficiency of a matrix multiplication operation.
    Type: Grant
    Filed: May 9, 2017
    Date of Patent: November 27, 2018
    Assignee: Beijing Baidu Netcom Science and Technology Co., Ltd.
    Inventors: Ni Zhou, Wei Qi, Yong Wang, Jian Ouyang
  • Patent number: 10127013
    Abstract: Integrated circuits with specialized processing blocks that can support both fixed-point and floating-point operations are provided. A specialized processing block of this type may include partial product generators, compression circuits, and a main adder. The main adder may include a high adder, a middle adder, a low adder, floating-point rounding circuitry, and associated selection circuitry. The middle adder may include prefix networks for outputting generate and propagate vectors, and redundant LSB processing logic for outputting LSB generate and propagate bits. The middle adder may include additional logic circuitry for generating a sum output, a sum-plus-1 output, and a sum-plus-2 output. The specialized processing block may further include accumulation circuitry for support multiply-accumulation functions for any suitable number of channels.
    Type: Grant
    Filed: December 23, 2016
    Date of Patent: November 13, 2018
    Assignee: Altera Corporation
    Inventor: Martin Langhammer
  • Patent number: 10108581
    Abstract: A vector reduction circuit configured to reduce an input vector of elements comprises a plurality of cells, wherein each of the plurality of cells other than a designated first cell that receives a designated first element of the input vector is configured to receive a particular element of the input vector, receive, from another of the one or more cells, a temporary reduction element, perform a reduction operation using the particular element and the temporary reduction element, and provide, as a new temporary reduction element, a result of performing the reduction operation using the particular element and the temporary reduction element. The vector reduction circuit also comprises an output circuit configured to provide, for output as a reduction of the input vector, a new temporary reduction element corresponding to a result of performing the reduction operation using a last element of the input vector.
    Type: Grant
    Filed: April 3, 2017
    Date of Patent: October 23, 2018
    Assignee: Google LLC
    Inventors: Gregory Michael Thorson, Andrew Everett Phelps, Olivier Temam
  • Patent number: 10097185
    Abstract: In an example embodiment, a digital block comprises a datapath circuit, one or more programmable logic devices (PLDs), and one or more control registers. The datapath circuit comprises structural arithmetic elements. The one or more PLDs comprise uncommitted programmable logic. The one or more control circuits comprise a control register configured to store user-defined control bits, where the one or more control circuits are configured to control both the structural arithmetic elements and the uncommitted programmable logic based on the user-defined control bits.
    Type: Grant
    Filed: December 20, 2017
    Date of Patent: October 9, 2018
    Assignee: Cypress Semiconductor Corporation
    Inventors: Bert Sullam, Warren Snyder, Haneef Mohammed
  • Patent number: 10089078
    Abstract: A circuit includes a multiplier, an adder, a first result register and a second result register coupled to outputs of the multiplier and the adder, respectively. The circuit further includes: a first selection unit configured to selectively provide, to the multiplier and in response to a first control signal, a first value from a first plurality of values; and a second selection unit configured to selectively provide, to the multiplier and in response to a second control signal, a second value from a second plurality of values. The circuit also includes: a third selection unit configured to selectively provide, to the adder and in response to a third control signal, a third value from a third plurality of values; and a fourth selection unit configured to selectively provide, to the adder and in response to a fourth control signal, a fourth value from a fourth plurality of values.
    Type: Grant
    Filed: September 23, 2016
    Date of Patent: October 2, 2018
    Assignee: STMICROELECTRONICS S.R.L.
    Inventors: David Vincenzoni, Samuele Raffaelli
  • Patent number: 10042639
    Abstract: According to one embodiment, a processor includes an instruction decoder to receive an instruction to process a multiply-accumulate operation, the instruction having a first operand, a second operand, a third operand, and a fourth operand. The first operand is to specify a first storage location to store an accumulated value; the second operand is to specify a second storage location to store a first value and a second value; and the third operand is to specify a third storage location to store a third value. The processor further includes an execution unit coupled to the instruction decoder to perform the multiply-accumulate operation to multiply the first value with the second value to generate a multiply result and to accumulate the multiply result and at least a portion of a third value to an accumulated value based on the fourth operand.
    Type: Grant
    Filed: January 3, 2017
    Date of Patent: August 7, 2018
    Assignee: Intel Corporation
    Inventors: Vinodh Gopal, Erdinc Ozturk, James D. Guilford, Gilbert M. Wolrich
  • Patent number: 10037210
    Abstract: An apparatus is described that includes a semiconductor chip having an instruction execution pipeline having one or more execution units with respective logic circuitry to: a) execute a first instruction that multiplies a first input operand and a second input operand and presents a lower portion of the result, where, the first and second input operands are respective elements of first and second input vectors; b) execute a second instruction that multiplies a first input operand and a second input operand and presents an upper portion of the result, where, the first and second input operands are respective elements of first and second input vectors; and, c) execute an add instruction where a carry term of the add instruction's adding is recorded in a mask register.
    Type: Grant
    Filed: September 6, 2016
    Date of Patent: July 31, 2018
    Assignee: INTEL CORPORATION
    Inventors: Gilbert M. Wolrich, Kirk S. Yap, James D. Guilford, Erdinc Ozturk, Vinodh Gopal, Wajdi K. Feghali, Sean M. Gulley, Martin G. Dixon
  • Patent number: 9946612
    Abstract: Implementations of encoding techniques are disclosed. In one embodiment, an encoding system includes a codec device, a switching network, a rerouting circuit, a logic integrated circuit, and memory devices. The codec device includes a plurality of input and output (I/O) ports to transport data signals. The switching network is coupled both to the plurality of I/O ports and to a plurality of channels external to the device. The plurality of I/O ports includes at least one spare channel. The rerouting circuitry is coupled to and configured to control the switching network and the logic integrated circuit has logic circuity including command and decode queueing circuitry, redundancy circuits, and error correction circuitry. The memory devices do include any circuitry included in the logic circuitry. Other systems and apparatuses are also described.
    Type: Grant
    Filed: July 20, 2015
    Date of Patent: April 17, 2018
    Assignee: Micron Technology, Inc.
    Inventor: Timothy M. Hollis
  • Patent number: 9760110
    Abstract: Methods and systems for memory-based computing include combining multiple operations into a single lookup table and combining multiple memory-based operation requests into a single read request. Operation result values are read from a multi-operation lookup table that includes result values for a first operation above a diagonal of the lookup table and includes result values for a second operation below the diagonal. Numerical inputs are used as column and row addresses in the lookup table and the requested operation determines which input corresponds to the column address and which input corresponds to the row address. Multiple operations are combined into a single request by combining respective members from each operation into respective inputs an reading an operation result value from a lookup table to produce a combined result output. The combined result output is separated into a plurality of individual result outputs corresponding to the plurality of requests.
    Type: Grant
    Filed: February 4, 2016
    Date of Patent: September 12, 2017
    Assignee: International Business Machines Corporation
    Inventors: Minsik Cho, Ruchir Puri
  • Patent number: 9753695
    Abstract: A datapath circuit may include a digital multiply and accumulate circuit (MAC) and a digital hardware calculator for parallel computation. The digital hardware calculator and the MAC may be coupled to an input memory element for receipt of input operands. The MAC may include a digital multiplier structure with partial product generators coupled to an adder to multiply a first and second input operands and generate a multiplication result. The digital hardware calculator may include a first look-up table coupled between a calculator input and a calculator output register. The first look-up table may include table entry values mapped to corresponding math function results in accordance with a first predetermined mathematical function. The digital hardware calculator may be configured to calculate, based on the first look-up table, a computationally hard mathematical function such as a logarithm function, an exponential function, a division function and a square root function.
    Type: Grant
    Filed: August 27, 2013
    Date of Patent: September 5, 2017
    Assignee: Analog Devices Global
    Inventors: Mikael M. Mortensen, Jeffrey G. Bernstein
  • Patent number: 9743082
    Abstract: The present invention relates to an apparatus and method for encoding and decoding an image by skip encoding. The image-encoding method by skip encoding, which performs intra-prediction, comprises: performing a filtering operation on the signal which is reconstructed prior to an encoding object signal in an encoding object image; using the filtered reconstructed signal to generate a prediction signal for the encoding object signal; setting the generated prediction signal as a reconstruction signal for the encoding object signal; and not encoding the residual signal which can be generated on the basis of the difference between the encoding object signal and the prediction signal, thereby performing skip encoding on the encoding object signal.
    Type: Grant
    Filed: March 10, 2015
    Date of Patent: August 22, 2017
    Assignees: Electronics and Telecommunications Research Institute, Kwangwoon University Industry-Academic Collaboration Foundation, University-Industry Cooperation Group Of Kyung Hee University
    Inventors: Sung Chang Lim, Ha Hyun Lee, Se Yoon Jeong, Hui Yong Kim, Suk Hee Cho, Jong Ho Kim, Jin Ho Lee, Jin Soo Choi, Jin Woong Kim, Chie Teuk Ahn, Dong Gyu Sim, Seoung Jun Oh, Gwang Hoon Park, Sea Nae Park, Chan Woong Jeon
  • Patent number: 9690579
    Abstract: A first floating-point operation unit receives first and second variables and performs a first operation generating a first output. A first rounding unit receives and rounds the first output to generate a second output if a control bit is in a first state. A second floating-point operation unit receives a third variable and either the first output or the second output and performs a second operation on the third variable and either the first output or the second output, to generate a third output. The second floating-point operation unit receives and operates on the first output if the control bit is in the first state, or the second output if the control bit is in the second state. A second rounding unit receives and rounds the third output.
    Type: Grant
    Filed: December 29, 2014
    Date of Patent: June 27, 2017
    Assignee: ARM Finance Overseas Limited
    Inventor: David Yiu-Man Lau
  • Patent number: 9692579
    Abstract: According to some embodiments, a secondary network node detects a first data transmission of media content from a primary network node to a first wireless device. The first data transmission has a first data quality description D(n1) and a first transport format T(k1). The secondary network node selects a second data quality description D(n2?) and a second transport format T(k2?) for a second data transmission. The second data quality description D(n2?) and second transport format T(k2?) differ from the first data quality description D(1) and first transport format T(k1), respectively. The secondary network node transmits the second data transmission to a second wireless device according to the second data quality description D(n2?) and the second transport format T(k2?). The second data transmission includes at least a portion of the media content.
    Type: Grant
    Filed: August 5, 2014
    Date of Patent: June 27, 2017
    Assignee: Telefonaktiebolaget LM Ericsson (publ)
    Inventor: Ali S. Khayrallah
  • Patent number: 9535706
    Abstract: According to one embodiment, a processor includes an instruction decoder to receive an instruction to process a multiply-accumulate operation, the instruction having a first operand, a second operand, a third operand, and a fourth operand. The first operand is to specify a first storage location to store an accumulated value; the second operand is to specify a second storage location to store a first value and a second value; and the third operand is to specify a third storage location to store a third value. The processor further includes an execution unit coupled to the instruction decoder to perform the multiply-accumulate operation to multiply the first value with the second value to generate a multiply result and to accumulate the multiply result and at least a portion of a third value to an accumulated value based on the fourth operand.
    Type: Grant
    Filed: March 22, 2016
    Date of Patent: January 3, 2017
    Assignee: Intel Corporation
    Inventors: Vinodh Gopal, Erdinc Ozturk, James D. Guilford, Gilbert M. Wolrich
  • Patent number: 9519460
    Abstract: A single-instruction multiple-data (SIMD) multiplier-accumulator apparatus and method. A multiplier block with two 16-bit by 32-bit multiplier circuits transform a selectable number of input multipliers and multiplicands into a selected number of products. Each multiplier circuit comprises an array of full adders that generates and sums partial products using carry-save addition. An accumulator block, with additional data width to help prevent overflow, adds the products to a selectable number of input addends and outputs a number of results. Embodiments perform one to four multiplications together, depending on the number of bits (eight, 16, 24, or 32) selected for the input operands. Embodiments output 20-bit, 40-bit, or 80-bit multiply-accumulate results at rates of at least 1.1 GHz. Embodiments support signed inputs, negated multiplication products, and Q-format data. A hybrid sign extension management approach improves performance for 80-bit outputs.
    Type: Grant
    Filed: September 25, 2014
    Date of Patent: December 13, 2016
    Assignee: Cadence Design Systems, Inc.
    Inventors: Aamir A. Farooqui, David Lawrence Heine
  • Patent number: 9495154
    Abstract: Embodiments disclosed herein include vector processing engines (VPEs) having programmable data path configurations for providing multi-mode vector processing. Related vector processors, systems, and methods are also disclosed. The VPEs include a vector processing stage(s) configured to process vector data according to a vector instruction executed in the vector processing stage. Each vector processing stage includes vector processing blocks each configured to process vector data based on the vector instruction being executed. The vector processing blocks are capable of providing different vector operations for different types of vector instructions based on data path configurations. Data paths of the vector processing blocks are programmable to be reprogrammable to process vector data differently according to the particular vector instruction being executed.
    Type: Grant
    Filed: March 13, 2013
    Date of Patent: November 15, 2016
    Assignee: QUALCOMM Incorporated
    Inventor: Raheel Khan
  • Patent number: 9483442
    Abstract: According to an embodiment, a matrix operation apparatus executing a matrix operation includes multiple nodes, the nodes including: a multiplier configured to perform a first operation for a first input, which is column data and a second input which is row data for the matrix operation and output element components of an operation result of the matrix operation; and an accumulator configured to perform cumulative addition of operation results of the multiplier.
    Type: Grant
    Filed: February 28, 2014
    Date of Patent: November 1, 2016
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventors: Seiji Maeda, Hiroyuki Usui
  • Patent number: 9465578
    Abstract: A system and method are provided for performing 32-bit or dual 16-bit floating-point arithmetic operations using logic circuitry. An operating mode that specifies an operating mode for a multiplication operation is received, where the operating mode is one of a 32-bit floating-point mode and a dual 16-bit floating-point mode. Based on the operating mode, nine recoding terms for a mantissa of at least one floating-point input operand are determined. A dual-mode multiplier array circuit that is configurable to generate partial products for either one 32-bit floating-point result or for two 16-bit floating-point results computes the partial products based on the nine recoding terms. The partial products are processed to generate an output based on the operating mode.
    Type: Grant
    Filed: December 13, 2013
    Date of Patent: October 11, 2016
    Assignee: NVIDIA Corporation
    Inventors: David C. Tannenbaum, Srinivasan Iyer
  • Patent number: 9448766
    Abstract: An arithmetic logic stage in a graphics pipeline includes a number of arithmetic logic units (ALUs). The ALUs each include, for example, a multiplier and an adder. The ALUs are interconnected by circuitry that, for example, routes the output from the multiplier in one ALU to both the adder in that ALU and an adder in another ALU.
    Type: Grant
    Filed: August 27, 2013
    Date of Patent: September 20, 2016
    Assignee: NVIDIA Corporation
    Inventors: Tyson Bergland, Michael J. M. Toksvig, Justin Michael Mahan
  • Patent number: 9436435
    Abstract: An apparatus is described that includes a semiconductor chip having an instruction execution pipeline having one or more execution units with respective logic circuitry to: a) execute a first instruction that multiplies a first input operand and a second input operand and presents a lower portion of the result, where, the first and second input operands are respective elements of first and second input vectors; b) execute a second instruction that multiplies a first input operand and a second input operand and presents an upper portion of the result, where, the first and second input operands are respective elements of first and second input vectors; and, c) execute an add instruction where a carry term of the add instruction's adding is recorded in a mask register.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: September 6, 2016
    Assignee: Intel Corporation
    Inventors: Gilbert M. Wolrich, Kirk S. Yap, James D. Guilford, Erdinc Ozturk, Vinodh Gopal, Wajdi K. Feghali, Sean M. Gulley, Martin G. Dixon
  • Patent number: 9430190
    Abstract: A method for operating a fused-multiply-add pipeline in a floating-point unit of a processor is disclosed. A multiplication is initially performed between a first operand and a second operand in a multiplier block to obtain a set of partial product results. The partial product results are sent to a carry-save adder block. A partial product reduction is performed on the partial product results to generate a carry-save result having a sum term and a carry term. The carry-save result is then formatted to generate a carry-out bit. The carry-save result is added to a third operand to generate a final result.
    Type: Grant
    Filed: January 31, 2014
    Date of Patent: August 30, 2016
    Assignee: International Business Machines Corporation
    Inventors: Son Dao Trong, Michael Klein, Christophe Layer, Silvia M. Mueller
  • Patent number: 9417843
    Abstract: Techniques are disclosed relating to performing extended multiplies without a carry flag. In one embodiment, an apparatus includes a multiply unit configured to perform multiplications of operands having a particular width. In this embodiment, the apparatus also includes multiple storage elements configured to store operands for the multiply unit. In this embodiment, each of the storage elements is configured to provide a portion of a stored operand that is less than an entirety of the stored operand in response to a control signal from the apparatus. In one embodiment, the apparatus is configured to perform a multiplication of given first and second operands having a width greater than the particular width by performing a sequence of multiply operations using the multiply unit, using portions of the stored operands and without using a carry flag between any of the sequence of multiply operations.
    Type: Grant
    Filed: August 20, 2013
    Date of Patent: August 16, 2016
    Assignee: Apple Inc.
    Inventors: James S. Blomgren, Terence M. Potter
  • Patent number: 9323500
    Abstract: In an embodiment, a fused multiply-add (FMA) circuit is configured to receive a plurality of input data values to perform an FMA instruction on the input data values. The circuit includes a multiplier unit and an adder unit coupled to an output of the multiplier unit, and a control logic to receive the input data values and to reduce switching activity and thus reduce power consumption of one or more components of the circuit based on a value of one or more of the input data values. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 5, 2013
    Date of Patent: April 26, 2016
    Assignee: Intel Corporation
    Inventors: Brian Hickmann, Dennis Bradford, Thomas Fletcher
  • Patent number: 9292297
    Abstract: According to one embodiment, a processor includes an instruction decoder to receive an instruction to process a multiply-accumulate operation, the instruction having a first operand, a second operand, a third operand, and a fourth operand. The first operand is to specify a first storage location to store an accumulated value; the second operand is to specify a second storage location to store a first value and a second value; and the third operand is to specify a third storage location to store a third value. The processor further includes an execution unit coupled to the instruction decoder to perform the multiply-accumulate operation to multiply the first value with the second value to generate a multiply result and to accumulate the multiply result and at least a portion of a third value to an accumulated value based on the fourth operand.
    Type: Grant
    Filed: September 14, 2012
    Date of Patent: March 22, 2016
    Assignee: Intel Corporation
    Inventors: Vinodh Gopal, Erdinc Ozturk, James D. Guilford, Gilbert M. Wolrich
  • Patent number: 9207908
    Abstract: A specialized processing block on an integrated circuit includes a first and second arithmetic operator stage, an output coupled to another specialized processing block, and configurable interconnect circuitry which may be configured to route signals throughout the specialized processing block, including in and out of the first and second arithmetic operator stages. The configurable interconnect circuitry may further include multiplexer circuitry to route selected signals. The output of the specialized processing block that is coupled to another specialized processing block together with the configurable interconnect circuitry reduces the need to use resources outside the specialized processing block when implementing mathematical functions that require the use of more than one specialized processing block. An example for such mathematical functions include the implementation of vector (dot product) operations, FIR filters, or sum-of-product operations.
    Type: Grant
    Filed: January 29, 2013
    Date of Patent: December 8, 2015
    Assignee: Altera Corporation
    Inventor: Martin Langhammer
  • Patent number: 9143189
    Abstract: Disclosed are various embodiments providing selecting various schemes for combining antenna signals for synchronization. For example, processing circuitry determines a first weight for a first antenna based on a signal strength metric of the first antenna and a second weight for a second antenna based on a signal strength metric of the second antenna. The processing circuitry receives a synchronization signal via the first antenna and the second antenna. Then the processing circuitry may weight the synchronization signal received via the first antenna according to the first weight to generate a first weighted synchronization signal and weight the synchronization signal received via the second antenna according to the second weight to generate a second weighted synchronization signal. The processing circuitry further combines the first weighted synchronization signal and the second weighted synchronization signal to generate a combined synchronization signal.
    Type: Grant
    Filed: June 19, 2012
    Date of Patent: September 22, 2015
    Assignee: BROADCOM CORPORATION
    Inventor: Shuangquan Wang
  • Publication number: 20150095394
    Abstract: One embodiment of the present invention includes a method for simplifying arithmetic operations by detecting operands with elementary values such as zero or 1.0. Computer and graphics processing systems perform a great number of multiply-add operations. In a significant portion of these operations, the values of one or more of the operands are zero or 1.0. By detecting the occurrence of these elementary values, math operations can be greatly simplified, for example by eliminating multiply operations when one multiplicand is zero or 1.0 or eliminating add operations when one addend is zero. The simplified math operations resulting from detecting elementary valued operands provide significant savings in overhead power, dynamic processing power, and cycle time.
    Type: Application
    Filed: September 27, 2013
    Publication date: April 2, 2015
    Applicant: NVIDIA CORPORATION
    Inventors: Daniel FINCHELSTEIN, David Conrad TANNENBAUM, Srinivasan (Vasu) IYER
  • Publication number: 20150095395
    Abstract: According to one embodiment, a processing device for multiplying a first polynomial with a second polynomial is described including a first memory storing a representation of the first polynomial, a controller configured to separate the first polynomial into parts, a second memory storing pre-determined results of the multiplications of the second polynomial with possible forms of the parts of the first polynomial, a third memory for storing the result of the multiplication, an address logic, configured to determine, for each part of the first polynomial, a start address of a memory block of the second memory based on the form of the part and the location of the part within the first polynomial and an adder configured to add, for each determined address of the memory block of the second memory, the content of the memory block of the second memory at least partially to the contents of the third memory, wherein the data element of the third memory to which the content of a data element of the memory block of th
    Type: Application
    Filed: October 2, 2013
    Publication date: April 2, 2015
    Applicant: Infineon Technologies AG
    Inventors: Andrea Hoeller, Tomaz Felicijan
  • Publication number: 20150088947
    Abstract: A method is described that involves executing a first instruction with a functional unit. The first instruction is a multiply-add instruction. The method further includes executing a second instruction with the functional unit. The second instruction is a round instruction.
    Type: Application
    Filed: December 3, 2014
    Publication date: March 26, 2015
    Applicant: INTEL CORPORATION
    Inventors: Cristina S. Anderson, Zeev Sperber, Simon Rubanovich, Benny Eitan, Amit Gradstein
  • Patent number: 8990282
    Abstract: A fused multiply add floating point unit 1 includes multiplying circuitry 4 and adding circuitry 8. The multiply circuitry 4 multiplies operands B and C having N-bit significands to generate an unrounded product B*C. The unrounded product B*C has an M-bit significand, where M>N. The adding circuitry 8 receives an operand A that is input at a later processing cycle than a processing cycle at which the multiplying circuitry 4 receives operands B and C. The adding circuitry 8 commences processing of the operand A after the unrounded product B*C is generated by the multiplying circuitry 4. The adding circuitry 8 adds the operand A to the unrounded product B*C and outputs a rounded result A+B*C.
    Type: Grant
    Filed: September 21, 2009
    Date of Patent: March 24, 2015
    Assignee: ARM Limited
    Inventor: David Raymond Lutz
  • Patent number: 8990283
    Abstract: A computer processor including a single fused-unfused floating point multiply-add (FMA) module computes the result of the operation A*B+C for floating point numbers for fused multiply-add rounding operations and unfused multiply-add rounding operations. In one embodiment, a fused multiply-add rounding implementation is augmented with additional hardware which calculates an unfused multiply-add rounding result without adding additional pipeline stages. In one embodiment, a computation by the fused-unfused floating point multiply-add (FMA) module is initiated using a single opcode which determines whether a fused multiply-add rounding result or unfused multiply-add rounding result is generated.
    Type: Grant
    Filed: October 24, 2011
    Date of Patent: March 24, 2015
    Assignee: Oracle America, Inc.
    Inventors: Murali K. Inaganti, Leonard D. Rarick
  • Publication number: 20150081753
    Abstract: One embodiment of the present invention includes a method for performing arithmetic operations on arbitrary width integers using fixed width elements. The method includes receiving a plurality of input operands, segmenting each input operand into multiple sectors, performing a plurality of multiply-add operations based on the multiple sectors to generate a plurality of multiply-add operation results, and combining the multiply-add operation results to generate a final result. One advantage of the disclosed embodiments is that, by using a common fused floating point multiply-add unit to perform arithmetic operations on integers of arbitrary width, the method avoids the area and power penalty of having additional dedicated integer units.
    Type: Application
    Filed: September 13, 2013
    Publication date: March 19, 2015
    Applicant: NVIDIA CORPORATION
    Inventors: Srinivasan (Vasu) IYER, Michael Alan FETTERMAN, David Conrad TANNENBAUM
  • Patent number: 8977670
    Abstract: Implementing an unfused multiply-add instruction within a fused multiply-add pipeline. The system may include an aligner having an input for receiving an addition term, a multiplier tree having two inputs for receiving a first value and a second value for multiplication, and a first carry save adder (CSA), wherein the first CSA may receive partial products from the multiplier tree and an aligned addition term from the aligner. The system may include a fused/unfused multiply add (FUMA) block which may receive the first partial product, the second partial product, and the aligned addition term, wherein the first partial product and the second partial product are not truncated. The FUMA block may perform an unfused multiply add operation or a fused multiply add operation using the first partial product, the second partial product, and the aligned addition term, e.g., depending on an opcode or mode bit.
    Type: Grant
    Filed: May 11, 2012
    Date of Patent: March 10, 2015
    Assignee: Oracle International Corporation
    Inventors: Jeffrey S. Brooks, Christopher H. Olson
  • Publication number: 20150058391
    Abstract: A processor includes a carry save array multiplier. The carry save array multiplier includes an array of cascaded partial product generators. The array of cascaded partial product generators is configured to generate an output value as a product of two operands presented at inputs of the multiplier. The array of cascaded partial product generators is also configured to generate an output value as a sum of two operands presented at inputs of the multiplier.
    Type: Application
    Filed: August 23, 2013
    Publication date: February 26, 2015
    Applicant: TEXAS INSTRUMENTS DEUTSCHLAND GMBH
    Inventors: Christian Wiencke, Armin Stingl
  • Patent number: 8930434
    Abstract: The principles of the present invention relate to a multiply and divide circuit configured to interactively multiply and/or divide. The circuit may handle signed and unsigned values. The circuit comprises an instruction register configured to store a multiply or divide instruction, at one input register configured to store the multiply or divide operands, an Arithmetic Logic Unit (“ALU”) configured to add provided values, and configuration circuitry. The configuration circuitry responds to the instructions and performs the multiply or divide operation by iteratively providing values to the ALU.
    Type: Grant
    Filed: September 12, 2006
    Date of Patent: January 6, 2015
    Assignee: Finisar Corporation
    Inventor: Gerald L. Dybsetter
  • Publication number: 20140379774
    Abstract: The system has first, second, third, and fourth subsystems. Each subsystem has first and second multipliers coupled, respectively, to first and second adders. Each multiplier has two inputs. The first adder is coupled to a first output, a first accumulator, and a bit shifter. The bit shifter is coupled to a third adder. The third adder is coupled to a multiplexer. The multiplexer is coupled to a second output and a second accumulator. The second adder is coupled to the third adder and the multiplexer. The first outputs of the first and second subsystems are coupled directly to a fourth adder, the second outputs of the first and second subsystems are coupled directly to a fifth adder, the first outputs of the third and fourth subsystems are coupled directly to a sixth adder, and the second outputs of the third and fourth subsystems are coupled directly to a seventh adder.
    Type: Application
    Filed: June 21, 2013
    Publication date: December 25, 2014
    Inventors: Niraj Gupta, Karthik N
  • Publication number: 20140365548
    Abstract: In at least one example embodiment, a microprocessor circuit is provided that includes a microprocessor core coupled to a data memory via a data memory bus comprising a predetermined integer number of data wires (J); the single-ported data memory configured for storage of vector input elements of an N element vector in a predetermined vector element order and storage of matrix input elements of an M×N matrix comprising M columns of matrix input elements and N rows of matrix input elements; a vector matrix product accelerator comprising a datapath configured for multiplying the N element vector and the matrix to compute an M element result vector, the vector matrix product accelerator comprising: an input/output port interfacing the data memory bus to the vector matrix product accelerator; a plurality of vector input registers for storage respective input vector elements received through the input/output port.
    Type: Application
    Filed: June 11, 2013
    Publication date: December 11, 2014
    Applicant: ANALOG DEVICES TECHNOLOGY
    Inventor: Mikael Mortensen
  • Patent number: 8903882
    Abstract: Various systems, apparatuses, processes, and programs may be used to calculate a multiply-sum of two carry-less multiplications of two input operands. In particular implementations, a system, apparatus, process, and program may include the ability to use input data busses for the input operands and an output data bus for an overall calculation result, each bus including a width of 2n bits, where n is an integer greater than one. The system, apparatus, process, and program may also calculate the carry-less multiplications of the two input operands for a lower level of a hierarchical structure and calculating the at least one multiply-sum and at least one intermediate multiply-sum for a higher level of the structure based on the carry-less multiplications of the lower level. A certain number of multiply-sums may be output as an overall calculation result dependent on mode of operation using the full width of said output data bus.
    Type: Grant
    Filed: July 15, 2011
    Date of Patent: December 2, 2014
    Assignee: International Business Machines Corporation
    Inventors: Maarten J. Boersma, Markus Kaltenbach, Jens Leenstra, Tim Niggemeier, Philipp Oehler, Philipp Panitz
  • Publication number: 20140324936
    Abstract: Processors and methods for solving mathematical equations are disclosed herein. An embodiment of the processor includes a hardware device that calculates coefficients based on a mathematical operation that is to be performed. An indexing device transmits the coefficients to and from a look up table. A hardware multiplier multiplies certain coefficients by the derivative of a function related to the mathematical operation. A hardware adder adds a first coefficient to the product of a second coefficient and the first order derivative of the function.
    Type: Application
    Filed: August 20, 2013
    Publication date: October 30, 2014
    Applicant: Texas Instruments Incorporated
    Inventors: Tessarolo Alexander, Chirag Gupta
  • Publication number: 20140289300
    Abstract: In a processor that includes a plurality of multipliers and a plurality of adders to execute matrix product processing, each data of input vector data involved in the arithmetic processing is used in two multipliers, whereby arithmetic processing of elements in different rows and different columns in a matrix product operation is executed with a single instruction, that enables the sharing of input data to reduce the number of times data are moved in the whole matrix product processing and reduce power consumption.
    Type: Application
    Filed: January 21, 2014
    Publication date: September 25, 2014
    Applicant: FUJITSU LIMITED
    Inventor: Yuichiro Ajima