Multiplication Followed By Addition (i.e., X*y+z) Patents (Class 708/523)

Apparatuses, methods, and systems for instructions of a matrix operations accelerator

Patent number: 12204605

Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described.

Type: Grant

Filed: July 27, 2023

Date of Patent: January 21, 2025

Assignee: Intel Corporation

Inventors: Amit Gradstein, Simon Rubanovich, Sagi Meller, Saeed Kharouf, Gavri Berger, Zeev Sperber, Jose Yallouz, Ron Schneider
Approximation of matrices for matrix multiply operations

Patent number: 12197533

Abstract: A processing device is provided which comprises memory configured to store data and a processor configured to receive a portion of data of a first matrix comprising a first plurality of elements and receive a portion of data of a second matrix comprising a second plurality of elements. The processor is also configured to determine values for a third matrix by dropping a number of products from products of pairs of elements of the first and second matrices based on approximating the products of the pairs of elements as a sum of the exponents of the pairs of elements and performing matrix multiplication on remaining products of the pairs of elements of the first and second matrices.

Type: Grant

Filed: March 26, 2021

Date of Patent: January 14, 2025

Assignee: Advanced Micro Devices, Inc.

Inventors: Pramod Vasant Argade, Swapnil P. Sakharshete, Maxim V. Kazakov, Alexander M. Potapov
High precision decomposable DSP entity

Patent number: 12182534

Abstract: A digital signal processing (DSP) block includes a plurality of multipliers and a summation block separate from the plurality of multipliers. The DSP block is configurable to perform a first multiplication operation to determine a first product of a first floating-point value and a second floating-point value using only a first multiplier of the plurality of multipliers. Additionally, the DSP block is configurable to perform a second multiplication operation between a third floating-point value and a fourth floating-point value by receiving, at each of the plurality of multipliers, two integer values generated from the third floating-point value and the fourth floating-point value, generating, via the plurality of multipliers, a plurality of subproducts by multiplying, at each of the multipliers, the two integer values, and generating a second product of the second multiplication operation by adding, via the summation block, the plurality of subproducts.

Type: Grant

Filed: June 25, 2021

Date of Patent: December 31, 2024

Assignee: Intel Corporation

Inventor: Martin Langhammer
Systems and methods for computing dot products of nibbles in two tile operands

Patent number: 12182568

Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (M,N) of the identified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the identified first source matrix by a corresponding nibble of a doubleword element (K,N) of the identified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element (M,N).

Type: Grant

Filed: August 14, 2023

Date of Patent: December 31, 2024

Assignee: Intel Corporation

Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall
Accelerating 2D convolutional layer mapping on a dot product architecture

Patent number: 12112141

Abstract: A method for performing a convolution operation includes storing, a convolution kernel in a first storage device, the convolution kernel having dimensions x by y; storing, in a second storage device, a first subset of element values of an input feature map having dimensions n by m; performing a first simultaneous multiplication, of each value of the first subset of element values of the input feature map with a first element value from among the x*y elements of the convolution kernel; for each remaining value of the x*y elements of the convolution kernel, performing, a simultaneous multiplication of the remaining value with a corresponding subset of element values of the input feature map; for each simultaneous multiplication, storing, result of the simultaneous multiplication in an accumulator; and outputting, the values of the accumulator as a first row of an output feature map.

Type: Grant

Filed: June 12, 2020

Date of Patent: October 8, 2024

Assignee: Samsung Electronics Co., Ltd.

Inventors: Ali Shafiee Ardestani, Joseph Hassoun
Folded integer multiplication for field-programmable gate arrays

Patent number: 11960853

Abstract: Folded integer multiplier (FIM) circuitry includes a multiplier configurable to perform multiplication and a first addition/subtraction unit and a second addition/subtraction unit both configurable to perform addition and subtraction. The FIM circuitry is configurable to determine each product of a plurality of products for a plurality of pairs of input values having a first number of bits by performing, using the first and second addition/subtraction units, a plurality of operations involving addition or subtraction, and performing, using the multiplier, a plurality of multiplication operations involving values having fewer bits than the first number of bits. The plurality of multiplication operations includes a first number of multiplication operations, and the multiplier is configurable to begin performing all multiplication operations of the plurality of multiplication operations within a first number of clock cycles equal to the first number of multiplication operations.

Type: Grant

Filed: March 26, 2021

Date of Patent: April 16, 2024

Assignee: Intel Corporation

Inventors: Martin Langhammer, Bogdan Mihai Pasca
Pipeline architecture for bitwise multiplier-accumulator (MAC)

Patent number: 11941407

Abstract: A unit for accumulating a plurality N of multiplied M bit values includes a receiving unit, a bit-wise multiplier and a bit-wise accumulator. The receiving unit receives a pipeline of multiplicands A and B such that, at each cycle, a new set of multiplicands is received. The bit-wise multiplier bit-wise multiplies bits of a current multiplicand A with bits of a current multiplicand B and to sum and carry between bit-wise multipliers. The bit-wise accumulator accumulates output of the bit-wise multiplier thereby to accumulate the multiplicands during the pipelining process.

Type: Grant

Filed: April 5, 2020

Date of Patent: March 26, 2024

Assignee: GSI Technology Inc.

Inventor: Avidan Akerib
Posit tensor processing

Patent number: 11928442

Abstract: A method related to posit tensor processing can include receiving, by a plurality of multiply-accumulator (MAC) units coupled to one another, a plurality of universal number (unum) or posit bit strings organized in a matrix and to be used as operands in a plurality of respective recursive operations performed using the plurality of MAC units and performing, using the MAC units, the plurality of respective recursive operations. Iterations of the respective recursive operations are performed using at least one bit string that is a same bit string as was used in a preceding iteration of the respective recursive operations. The method can further include prior to receiving the plurality of unum or posit bit strings, performing an operation to organize the plurality of unum or posit bit strings to achieve a threshold bandwidth ratio, a threshold latency, or both during performance of the plurality of respective recursive operations.

Type: Grant

Filed: January 3, 2022

Date of Patent: March 12, 2024

Assignee: Micron Technology, Inc.

Inventor: Vijay S. Ramesh
Apparatus and method for performing dual signed and unsigned multiplication of packed data elements

Patent number: 11809867

Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements.

Type: Grant

Filed: September 21, 2020

Date of Patent: November 7, 2023

Assignee: Intel Corporation

Inventors: Venkateswara Madduri, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Mark Charney, Robert Valentine, Binwei Yang
Information processing apparatus, information processing method, non-transitory computer-readable storage medium

Patent number: 11810330

Abstract: An information processing apparatus comprises a control unit configured to set a shift amount based on a bit width of data, for each layer of a network including a plurality of layers, a plurality of MAC (multiply-accumulate) units configured to execute MAC operations on a plurality of data and a plurality of filter coefficients of the layer, a plurality of shift operation units configured to shift a plurality of MAC operation results obtained by the plurality of MAC units based on the shift amount, and an adding unit configured to calculate a total sum of the plurality of MAC operation results shifted by the plurality of shift operation units.

Type: Grant

Filed: August 29, 2022

Date of Patent: November 7, 2023

Assignee: Canon Kabushiki Kaisha

Inventor: Tsewei Chen
MAC operating device and method for processing machine learning algorithm

Patent number: 11803354

Abstract: A MAC operating device comprising a plurality of operation circuits respectively including an operation capacitor and a plurality of switches; and a division capacitor, wherein one end of the operation capacitor is respectively connected to a first operation switch connected to an input terminal and a first reset switch connected to a ground terminal, and the other end of the operation capacitor is connected to both a second operation switch connected to a division capacitor and a second reset switch connected to the ground terminal is provided.

Type: Grant

Filed: February 18, 2021

Date of Patent: October 31, 2023

Assignee: KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY

Inventors: Seonghwan Cho, Hyuk Jin Lee, Kyung Hyun Kim, Jin-O Seo
Integrated circuits with machine learning extensions

Patent number: 11726744

Abstract: An integrated circuit with specialized processing blocks is provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.

Type: Grant

Filed: March 26, 2021

Date of Patent: August 15, 2023

Assignee: Intel Corporation

Inventors: Martin Langhammer, Dongdong Chen, Kevin Hurd
Hardware accelerated convolution

Patent number: 11657119

Abstract: A processing device is provided which includes memory configured to store data and a processor configured to determine, based on convolutional parameters associated with an image, a virtual general matrix-matrix multiplication (GEMM) space of a virtual GEMM space output matrix and generate, in the virtual GEMM space output matrix, a convolution result by matrix multiplying the data corresponding to a virtual GEMM space input matrix with the data corresponding to a virtual GEMM space filter matrix. The processing device also includes convolutional mapping hardware configured to map, based on the convolutional parameters, positions of the virtual GEMM space input matrix to positions of an image space of the image.

Type: Grant

Filed: August 30, 2019

Date of Patent: May 23, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Swapnil P. Sakharshete, Samuel Lawrence Wasmundt, Maxim V. Kazakov, Vineet Goel
Systems, apparatuses, and methods for fused multiply add

Patent number: 11544058

Abstract: Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand.

Type: Grant

Filed: September 28, 2021

Date of Patent: January 3, 2023

Assignee: Intel Corporation

Inventors: Robert Valentine, Galina Ryvchin, Piotr Majcher, Mark J. Charney, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Milind B. Girkar, Zeev Sperber, Simon Rubanovich, Amit Gradstein
Systems, apparatuses, and methods for fused multiply add

Patent number: 11526354

Abstract: Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand.

Type: Grant

Filed: September 28, 2021

Date of Patent: December 13, 2022

Assignee: Intel Corporation

Inventors: Robert Valentine, Galina Ryvchin, Piotr Majcher, Mark J. Charney, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Milind B. Girkar, Zeev Sperber, Simon Rubanovich, Amit Gradstein
Systems, apparatuses, and methods for fused multiply add

Patent number: 11526353

Abstract: Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand.

Type: Grant

Filed: September 7, 2021

Date of Patent: December 13, 2022

Assignee: Intel Corporation

Inventors: Robert Valentine, Galina Ryvchin, Piotr Majcher, Mark J. Charney, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Milind B. Girkar, Zeev Sperber, Simon Rubanovich, Amit Gradstein
Systems, apparatuses, and methods for fused multiply add

Patent number: 11507369

Abstract: Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand.

Type: Grant

Filed: September 3, 2021

Date of Patent: November 22, 2022

Assignee: Intel Corporation

Inventors: Robert Valentine, Galina Ryvchin, Piotr Majcher, Mark J. Charney, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Milind B. Girkar, Zeev Sperber, Simon Rubanovich, Amit Gradstein
Neural processing element with single instruction multiple data (SIMD) compute lanes

Patent number: 11507349

Abstract: An architecture is disclosed for an neural processing element having single instruction, multiple data (“SIMD”) compute lanes. The neural processing element includes compute lanes having multipliers configured to multiply a binary operand with another binary operand to generate a binary output. The neural processing element also includes a single adder tree for summing the binary outputs of the hardware binary multipliers. The neural processing element also includes a storage element for storing a binary output of the single hardware binary adder tree.

Type: Grant

Filed: June 26, 2019

Date of Patent: November 22, 2022

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Chad Balling McBride, Amol A. Ambardekar, Boris Bobrov, Kent D. Cedola, George Petre, Larry Marvin Wall
Adaptive matrix multiplication accelerator for machine learning and deep learning applications

Patent number: 11475102

Abstract: An adaptive matrix multiplier. In some embodiments, the matrix multiplier includes a first multiplying unit a second multiplying unit, a memory load circuit, and an outer buffer circuit. The first multiplying unit includes a first inner buffer circuit and a second inner buffer circuit, and the second multiplying unit includes a first inner buffer circuit and a second inner buffer circuit. The memory load circuit is configured to load data from memory, in a single burst of a burst memory access mode, into the first inner buffer circuit of the first multiplying unit; and into the first inner buffer circuit of the second multiplying unit.

Type: Grant

Filed: May 8, 2019

Date of Patent: October 18, 2022

Assignee: Samsung Electronics Co., Ltd.

Inventors: Dongyan Jiang, Dimin Niu, Hongzhong Zheng
Information processing apparatus, information processing method, non-transitory computer-readable storage medium

Patent number: 11468600

Abstract: An information processing apparatus comprises a control unit configured to set a shift amount based on a bit width of data, for each layer of a network including a plurality of layers, a plurality of MAC (multiply-accumulate) units configured to execute MAC operations on a plurality of data and a plurality of filter coefficients of the layer, a plurality of shift operation units configured to shift a plurality of MAC operation results obtained by the plurality of MAC units based on the shift amount, and an adding unit configured to calculate a total sum of the plurality of MAC operation results shifted by the plurality of shift operation units.

Type: Grant

Filed: October 1, 2019

Date of Patent: October 11, 2022

Assignee: Canon Kabushiki Kaisha

Inventor: Tsewei Chen
Pairing SIMD lanes to perform double precision operations

Patent number: 11409536

Abstract: A method and apparatus for performing a multi-precision computation in a plurality of arithmetic logic units (ALUs) includes pairing a first Single Instruction/Multiple Data (SIMD) block channel device with a second SIMD block channel device to create a first block pair having one-level staggering between the first and second channel devices. A third SIMD block channel device is paired with a fourth SIMD block channel device to create a second block pair having one-level staggering between the third and fourth channel devices. A plurality of source inputs are received at the first block pair and the second block pair. The first block pair computes a first result, and the second block pair computes a second result.

Type: Grant

Filed: November 3, 2016

Date of Patent: August 9, 2022

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Bin He, YunXiao Zou, Jiasheng Chen, Michael Mantor
Apparatus and method for multiplication and accumulation of complex values

Patent number: 11334319

Abstract: An apparatus and method for multiplying packed unsigned words.

Type: Grant

Filed: June 30, 2017

Date of Patent: May 17, 2022

Assignee: Intel Corporation

Inventors: Venkateswara Rao Madduri, Elmoustapha Ould-Ahmed-Vall, Robert Valentine
Compute optimizations for low precision machine learning operations

Patent number: 11308574

Abstract: Embodiments described herein provide a graphics processor that can perform a variety of mixed and multiple precision instructions and operations. One embodiment provides a streaming multiprocessor that can concurrently execute multiple thread groups, wherein the streaming multiprocessor includes a single instruction, multiple thread (SIMT) architecture and the streaming multiprocessor is to execute multiple threads for each of multiple instructions. The streaming multiprocessor can perform concurrent integer and floating-point operations and includes a mixed precision core to perform operations at multiple precisions.

Type: Grant

Filed: August 3, 2020

Date of Patent: April 19, 2022

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Sara S. Baghsorkhi, Anbang Yao, Kevin Nealis, Xiaoming Chen, Altug Koker, Abhishek R. Appu, John C. Weast, Mike B. Macpherson, Dukhwan Kim, Linda L. Hurd, Ben J. Ashbaugh, Barath Lakshmanan, Liwei Ma, Joydeep Ray, Ping T. Tang, Michael S. Strickland
Conversion circuitry

Patent number: 11281428

Abstract: A data processing apparatus is provided to convert a plurality of signed digits to an output value, the data processing apparatus comprising: receiver circuitry to receive, at each of a plurality of iterations, a signed digit from the plurality of signed digits, and previous intermediate data. Conversion circuitry performs a negative-output conversion from the signed digit to an unsigned digit, such that the output value comprising the unsigned digit is negative. Concatenation circuitry concatenate bits of the unsigned digit and bits of the previous intermediate data to produce updated intermediate data and output circuitry provides the updated intermediate data as the previous intermediate data of a next iteration. After the plurality of iterations, the output circuitry outputs at least part of the updated intermediate data as the output value.

Type: Grant

Filed: March 12, 2019

Date of Patent: March 22, 2022

Assignee: ARM LIMITED

Inventor: Javier Diaz Bruguera
Compute optimization mechanism

Patent number: 11270405

Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a mixed precision core to perform a mixed precision multi-dimensional matrix multiply and accumulate operation on 8-bit and/or 32 bit signed or unsigned integer elements.

Type: Grant

Filed: August 3, 2020

Date of Patent: March 8, 2022

Assignee: Intel Corporation

Inventors: Abhishek R. Appu, Altug Koker, Linda L. Hurd, Dukhwan Kim, Mike B. Macpherson, John C. Weast, Feng Chen, Farshad Akhbari, Narayan Srinivasa, Nadathur Rajagopalan Satish, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman
Multiple mode arithmetic circuit

Patent number: 11256476

Abstract: A tile of an FPGA includes a multiple mode arithmetic circuit. The multiple mode arithmetic circuit is configured by control signals to operate in an integer mode, a floating-point mode, or both. In some example embodiments, multiple integer modes (e.g., unsigned, two's complement, and sign-magnitude) are selectable, multiple floating-point modes (e.g., 16-bit mantissa and 8-bit sign, 8-bit mantissa and 6-bit sign, and 6-bit mantissa and 6-bit sign) are supported, or any suitable combination thereof. The tile may also fuse a memory circuit with the arithmetic circuits. Connections directly between multiple instances of the tile are also available, allowing multiple tiles to be treated as larger memories or arithmetic circuits. By using these connections, referred to as cascade inputs and outputs, the input and output bandwidth of the arithmetic circuit is further increased.

Type: Grant

Filed: August 8, 2019

Date of Patent: February 22, 2022

Assignee: Achronix Semiconductor Corporation

Inventors: Daniel Pugh, Raymond Nijssen, Michael Philip Fitton, Marcel Van der Goot
Posit tensor processing

Patent number: 11249723

Abstract: A method related to posit tensor processing can include receiving, by a plurality of multiply-accumulator (MAC) units coupled to one another, a plurality of universal number (unum) or posit bit strings organized in a matrix and to be used as operands in a plurality of respective recursive operations performed using the plurality of MAC units and performing, using the MAC units, the plurality of respective recursive operations. Iterations of the respective recursive operations are performed using at least one bit string that is a same bit string as was used in a preceding iteration of the respective recursive operations. The method can further include prior to receiving the plurality of unum or posit bit strings, performing an operation to organize the plurality of unum or posit bit strings to achieve a threshold bandwidth ratio, a threshold latency, or both during performance of the plurality of respective recursive operations.

Type: Grant

Filed: April 2, 2020

Date of Patent: February 15, 2022

Assignee: Micron Technology, Inc.

Inventor: Vijay S. Ramesh
Multiply-accumulate instruction processing method and apparatus

Patent number: 11237833

Abstract: The present invention discloses an instruction processing apparatus, comprising a first register adapted to store first source data, a second register adapted to store second source data, a third register adapted to store accumulated data, a decoder adapted to receive and decode a multiply-accumulate instruction, and an execution unit. The multiply-accumulate instruction indicates that the first register serves as a first operand, the second register serves as a second operand, the third register serves as a third operand, and a shift flag.

Type: Grant

Filed: April 10, 2020

Date of Patent: February 1, 2022

Assignee: Alibaba Group Holding Limited

Inventors: Jiahui Luo, Zhijian Chen, Yubo Guo, Wenmeng Zhang
Texture filtering with dynamic scheduling in computer graphics

Patent number: 11200723

Abstract: A texture filtering unit includes a datapath block and a control block. The datapath block includes one or more parallel computation pipelines, each containing at least one hardware logic component configured to receive a plurality of inputs and generate an output value as part of a texture filtering operation. The control block includes a plurality of sequencers and an arbiter. Each sequencer executes a micro-program that defines a sequence of operations to be performed by the one or more pipelines in the datapath block as part of a texture filtering operation and the arbiter controls access, by the sequencers, to the one or more pipelines in the datapath based on predefined prioritization rules.

Type: Grant

Filed: February 25, 2020

Date of Patent: December 14, 2021

Assignee: Imagination Technologies Limited

Inventor: Casper Van Benthem
Method and system for approximate quantum circuit synthesis using quaternion algebra

Patent number: 11113084

Abstract: This application concerns methods, apparatus, and systems for performing quantum circuit synthesis and/or for implementing the synthesis results in a quantum computer system. In certain example embodiments: a universal gate set, a target unitary described by a target angle, and target precision is received (input); a corresponding quaternion approximation of the target unitary is determined; and a quantum circuit corresponding to the quaternion approximation is synthesized, the quantum circuit being over a single qubit gate set, the single qubit gate set being realizable by the given universal gate set for the target quantum computer architecture.

Type: Grant

Filed: September 26, 2016

Date of Patent: September 7, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Vadym Kliuchnikov, Jon Yard, Martin Roetteler, Alexei Bocharov
Vector reduction processor

Patent number: 11061854

Abstract: A vector reduction circuit configured to reduce an input vector of elements comprises a plurality of cells, wherein each of the plurality of cells other than a designated first cell that receives a designated first element of the input vector is configured to receive a particular element of the input vector, receive, from another of the one or more cells, a temporary reduction element, perform a reduction operation using the particular element and the temporary reduction element, and provide, as a new temporary reduction element, a result of performing the reduction operation using the particular element and the temporary reduction element. The vector reduction circuit also comprises an output circuit configured to provide, for output as a reduction of the input vector, a new temporary reduction element corresponding to a result of performing the reduction operation using a last element of the input vector.

Type: Grant

Filed: July 1, 2020

Date of Patent: July 13, 2021

Assignee: Google LLC

Inventors: Gregory Michael Thorson, Andrew Everett Phelps, Olivier Temam
Neural network accelerating device and method of controlling the same

Patent number: 10990354

Abstract: An accelerating device includes a signal detector that converts a first input signal and a second input signal into a first converted input signal and a second converted input signal, respectively, and that generates a final zero-value flag signal, a first one-value flag signal, and a second one-value flag signal. The accelerating device further includes a processing element (PE) that processes the first converted input signal and the second converted input signal based on the final zero-value flag signal, the first one-value flag signal, and the second one-value flag signal and that skips a first arithmetic operation and a second arithmetic operation when the final zero-value flag signal has a first value. The first value of the final zero-value flag signal indicates that the first input signal, or the second input signal, or both have a value of 0.

Type: Grant

Filed: September 12, 2019

Date of Patent: April 27, 2021

Assignee: SK hynix Inc.

Inventor: Jae Hyeok Jang
Digital circuit with compressed carry

Patent number: 10853037

Abstract: Embodiments of the present disclosure pertain to digital circuits with compressed carries. In one embodiment, an adder circuit generates a sum and carry. The carry is compressed to reduce the number of bits required to represent the carry. In one embodiment, a multiplier circuit generates output product values. The output product values may be summed to produce a sum and carry. The carry may be compressed. In other embodiments, a multiplier circuit receives an input sum and compressed carry. The compressed input carry is decompressed and added to output product values and the input sum, and a resulting carry is compressed. The output of such a multiplier is another sum and compressed carry.

Type: Grant

Filed: July 14, 2020

Date of Patent: December 1, 2020

Assignee: Groq, Inc.

Inventors: Christopher Aaron Clark, Jonathan Ross
Control of instruction execution in a data processor

Patent number: 10846088

Abstract: When executing a program on a data processor comprising an execution unit for executing instructions in a program to be executed by the data processor, the execution unit being associated with one or more hardware units operable to execute instructions, at least one instruction in a program is associated with an indication of whether the instruction should be issued directly for execution by a hardware unit or should be intercepted during its execution by the execution unit. The execution unit then, when decoding the instruction for execution by a hardware unit in the program, determines from the indication associated with the instruction whether the instruction should be issued directly for execution by a hardware unit or intercepted during its execution by the execution unit, and issues the instruction for execution by a hardware unit directly, or pauses execution of the instruction and performs another operation, accordingly.

Type: Grant

Filed: August 21, 2018

Date of Patent: November 24, 2020

Assignee: Arm Limited

Inventors: Mark Underwood, Hakan Lars-Goran Persson, Arne Aas
Fixed-point and floating-point arithmetic operator circuits in specialized processing blocks

Patent number: 10838695

Abstract: The present embodiments relate to circuitry that efficiently performs floating-point arithmetic operations and fixed-point arithmetic operations. Such circuitry may be implemented in specialized processing blocks. If desired, the specialized processing blocks may include configurable interconnect circuitry to support a variety of different use modes. For example, the specialized processing block may efficiently perform a fixed-point or floating-point addition operation or a portion thereof, a fixed-point or floating-point multiplication operation or a portion thereof, a fixed-point or floating-point multiply-add operation or a portion thereof, just to name a few. In some embodiments, two or more specialized processing blocks may be arranged in a cascade chain and perform together more complex operations such as a recursive mode dot product of two vectors of floating-point numbers or a Radix-2 Butterfly circuit, just to name a few.

Type: Grant

Filed: June 4, 2019

Date of Patent: November 17, 2020

Assignee: Altera Corporation

Inventor: Martin Langhammer
Reconfigurable matrix multiplier system and method

Patent number: 10817587

Abstract: A reconfigurable matrix multiplier (RMM) system/method allowing tight or loose coupling to supervisory control processor application control logic (ACL) in a system-on-a-chip (SOC) environment is disclosed. The RMM provides for C=A*B matrix multiplication operations having A-multiplier-matrix (AMM), B-multiplicand-matrix (BMM), and C-product-matrix (CPM), as well as C=A*B+D operations in which D-summation-matrix (DSM) represents the result of a previous multiplication operation or another previously defined matrix. The RMM provides for additional CPM LOAD/STORE paths allowing overlapping of compute/data transfer operations and provides for CPM data feedback to the AMM or BMM operand inputs from a previously calculated CPM result.

Type: Grant

Filed: February 26, 2018

Date of Patent: October 27, 2020

Assignee: TEXAS INSTRUMENTS INCORPORATED

Inventors: Arthur John Redfern, Donald Edward Steiss, Timothy David Anderson, Kai Chirca
Apparatus and method for multiplication and accumulation of complex and real packed data elements

Patent number: 10795676

Abstract: An apparatus and method for multiplying packed real and imaginary components of complex numbers.

Type: Grant

Filed: September 29, 2017

Date of Patent: October 6, 2020

Assignee: Intel Corporation

Inventors: Venkateswara Madduri, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Mark Charney, Robert Valentine, Binwei Yang
Microprocessor with dynamically adjustable bit width for processing data

Patent number: 10776109

Abstract: A microprocessor with dynamically adjustable bit width is provided, which has a bit width register, a datapath, a statistical register, and a bit width adjuster. The bit width register stores at least one bit width. The datapath operates according to the bit width stored in the bit width register to acquire input operands from received data and process input operands. The statistical register collects calculation results of the datapath. The bit width adjuster adjusts the bit width stored in the bit width register based on the calculation results collected in the statistical register.

Type: Grant

Filed: October 18, 2018

Date of Patent: September 15, 2020

Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.

Inventors: Jing Chen, Xiaoyang Li, Juanli Song, Zhenhua Huang, Weilin Wang, Jiin Lai
Data alignment and formatting for graphics processing unit

Patent number: 10769746

Abstract: A data queuing and format apparatus is disclosed. A first selection circuit may be configured to selectively couple a first subset of data to a first plurality of data lines dependent upon control information, and a second selection circuit may be configured to selectively couple a second subset of data to a second plurality of data lines dependent upon the control information. A storage array may include multiple storage units, and each storage unit may be configured to receive data from one or more data lines of either the first or second plurality of data lines dependent upon the control information.

Type: Grant

Filed: September 25, 2014

Date of Patent: September 8, 2020

Assignee: Apple Inc.

Inventors: Liang Xia, Robert D. Kenney, Benjiman L. Goodman, Terence M. Potter
Apparatus and method for vector multiply and accumulate of unsigned doublewords

Patent number: 10664270

Abstract: An apparatus and method for performing signed multiplication of packed signed/unsigned doublewords and accumulation with a quadword.

Type: Grant

Filed: December 21, 2017

Date of Patent: May 26, 2020

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Mark Charney, Jesus Corbal, Venkateswara Madduri
Stochastic rounding logic

Patent number: 10628124

Abstract: Techniques and circuits are provided for stochastic rounding. In an embodiment, a circuit includes carry-save adder (CSA) logic having three or more CSA inputs, a CSA sum output, and a CSA carry output. One of the three or more CSA inputs is presented with a random number value, while other CSA inputs are presented with input values to be summed. The circuit further includes adder logic having adder inputs and a sum output. The CSA carry output of the CSA logic is coupled with one of the adder inputs of the adder logic, and the CSA sum output of the CSA logic is coupled with another input of the adder inputs of the adder logic. A particular number of most significant bits of the sum output of the adder logic represent a stochastically rounded sum of the input values.

Type: Grant

Filed: March 22, 2018

Date of Patent: April 21, 2020

Assignee: ADVANCED MICRO DEVICES, INC.

Inventor: Gabriel H. Loh
Efficient extended-precision processing

Patent number: 10546045

Abstract: Systems and methods are provided for performing a dot product. Each of a first series of numbers is divided into a first value, comprising the N most significant bits of the number, and a second value to form first and second sets of values. Each of a second series of numbers is divided into a third value, comprising the N most significant bits of the number, and a fourth value to form third and fourth sets of values. A dot product of the first and fourth sets of values is computed to provide a first partial sum. A dot product of the first and third sets of values is computed to provide a second partial sum. A dot product of the second and third sets of values is computed to provide a third partial sum. The partial sums are summed to provide a result for the dot product.

Type: Grant

Filed: December 19, 2017

Date of Patent: January 28, 2020

Assignee: TEXAS INSTRUMENTS INCORPORATED

Inventors: Lester Anderson Longley, Misael Lopez Cruz, Victor Cheng
Instructions for fused multiply-add operations with variable precision input operands

Patent number: 10528346

Abstract: Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.

Type: Grant

Filed: March 29, 2018

Date of Patent: January 7, 2020

Assignee: Intel Corporation

Inventors: Dipankar Das, Naveen K. Mellempudi, Mrinmay Dutta, Arun Kumar, Dheevatsa Mudigere, Abhisek Kundu
Multiply-add operations of binary numbers in an arithmetic unit

Patent number: 10372417

Abstract: Disclosed herein is a computer implemented method for performing multiply-add operations of binary numbers P, Q, R, S, B in an arithmetic unit of a processor, the operation calculating a result as an accumulated sum, which equals to B+n×P×Q+m×R×S, where n and m are natural numbers. Further disclosed herein is an arithmetic unit configured to implement multiply-add operations of binary numbers P, Q, R, S, B comprising at least a first binary arithmetic unit for calculating an aligned high part result and a second binary arithmetic unit for calculating an aligned low part result of the multiply-add operations.

Type: Grant

Filed: July 13, 2017

Date of Patent: August 6, 2019

Assignee: International Business Machines Corporation

Inventors: Tina Babinsky, Michael Klein, Cedric Lichtenau, Silvia M. Mueller
Machine perception and dense algorithm integrated circuit

Patent number: 10365860

Abstract: A circuit that includes a plurality of array cores, each array core of the plurality of array cores comprising: a plurality of distinct data processing circuits; and a data queue register file; a plurality of border cores, each border core of the plurality of border cores comprising: at least a register file, wherein: [i] at least a subset of the plurality of border cores encompasses a periphery of a first subset of the plurality of array cores; and [ii] a combination of the plurality of array cores and the plurality of border cores define an integrated circuit array.

Type: Grant

Filed: March 1, 2019

Date of Patent: July 30, 2019

Assignee: quadric.io, Inc.

Inventors: Nigel Drego, Aman Sikka, Mrinalini Ravichandran, Ananth Durbha, Robert Daniel Firu, Veerbhan Kheterpal
Tensor register files

Patent number: 10338925

Abstract: Tensor register files in a hardware accelerator are disclosed. An apparatus may comprise tensor operation calculators each configured to perform a type of tensor operation. The apparatus may also comprises tensor register files, each of which is associated with one of the tensor operation calculators. The apparatus may also comprises logic configured to store respective ones of the tensors in the plurality of tensor register files in accordance with the type of tensor operation to be performed on the respective tensors. The apparatus may also control read access to tensor register files based on a type of tensor operation that a machine instruction is to perform.

Type: Grant

Filed: May 24, 2017

Date of Patent: July 2, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jeremy Halden Fowers, Steven Karl Reinhardt, Kalin Ovtcharov, Eric Sen Chung
Processor and method for executing in-memory copy instructions indicating on-chip or off-chip memory

Patent number: 10261796

Abstract: A processor and a method for executing an instruction on a processor are provided. In the method, a to-be-executed instruction is fetched, the instruction including a source address field, a destination address field, an operation type field, and an operation parameter field; in at least one execution unit, an execution unit controlled by a to-be-generated control signal according to the operation type field is determined, a source address and a destination address of data operated by the execution unit are determined according to the source address field and the destination address field, and a data amount of the data operated by the execution unit controlled by the to-be-generated control signal is determined according to the operation parameter field; the control signal is generated; and the execution unit in the at least one execution unit is controlled by using the control signal.

Type: Grant

Filed: November 23, 2016

Date of Patent: April 16, 2019

Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD

Inventors: Jian Ouyang, Wei Qi, Yong Wang
Write nullification

Patent number: 10198263

Abstract: Apparatus and methods are disclosed for nullifying one or more registers identified in a target field of a nullification instruction. In some examples of the disclosed technology, an apparatus can include memory and one or more block-based processor cores configured to fetch and execute a plurality of instruction blocks. One of the cores can include a control unit configured, based at least in part on receiving a nullification instruction, to obtain a register identification of at least one of a plurality of registers, based on a target field of the nullification instruction. A write to the at least one register associated with the register identification is nullified. The nullification instruction is in a first instruction block of the plurality of instruction blocks. Based on the nullified write to the at least one register, a subsequent instruction is executed from a second, different instruction block.

Type: Grant

Filed: March 3, 2016

Date of Patent: February 5, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Douglas C. Burger, Aaron L. Smith
Resistive memory arrays for performing multiply-accumulate operations

Patent number: 10169297

Abstract: In one example in accordance with the present disclosure a resistive memory array is described. The array includes a number of resistive memory elements to receive a common-valued read signal. The array also includes a number of multiplication engines to perform a multiply operation by receiving a memory element output from a corresponding resistive memory element, receiving an input signal, and generating a multiplication output based on a received memory element output and a received input signal. The array also includes an accumulation engine to sum multiplication outputs from the number of multiplication engines.

Type: Grant

Filed: April 16, 2015

Date of Patent: January 1, 2019

Assignee: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP

Inventor: Brent Buchanan
Correlation operation circuit and semiconductor device

Patent number: 10152456

Abstract: A correlation operation circuit includes a first SRAM storing a plurality of pieces of detection pattern data, product-sum operators, a second SRAM storing intermediate data, and a comparator. When time series data is sequentially input, the intermediate data of all correlation functions referring to one time series data in a period during which the one time series data is input. When one time series data is input, the product-sum operator multiplies the detection pattern data sequentially read from the first SRAM by the one input time series data. The corresponding intermediate data is read from the second SRAM in synchronization with the multiplication, and the sequentially-calculated products are cumulatively added to the read intermediate data to be written back into the second SRAM as the intermediate data. As a result, the calculated correlation function data is supplied to the comparator to be compared with a predetermined specified value.

Type: Grant

Filed: May 1, 2017

Date of Patent: December 11, 2018

Assignee: Renesas Electronics Corporation

Inventor: Hiroshi Ueki

1 2 3 4 5 … next