Sum Of Products Generation Patents (Class 708/603)

Compressed fixed-point SIMD macroblock rotation systems and methods

Patent number: 12380666

Abstract: Various techniques are provided for efficient bilinear interpolation of rotated pixels. In one example, a method includes identifying a rotation angle for an image; performing a vector load of pixel positions for the image at the rotation angle; performing a vector load of rows of pixels associated with the pixel positions; performing a vector selection of a subset of pixels from the rows of pixels based on the identified pixel positions; performing a vector load of a set of coefficients at the rotation angle; and applying the set of coefficients to the subset of pixels to determine an updated value for the image. Additional methods and systems are also provided.

Type: Grant

Filed: October 4, 2022

Date of Patent: August 5, 2025

Assignee: FLIR UNMANNED AERIAL SYSTEMS AS

Inventors: Lars Petter Endresen, Øystein Hovind
Drive circuit and display apparatus

Patent number: 12300138

Abstract: A drive circuit disclosed by the present application includes: a first terminal; a plurality of second terminals; a first circuit module electrically connected to the first terminal and the plurality of second terminals, where the first circuit module is configured to reduce alternating current power generated when a drive signal accessed by the first terminal is transmitted to the plurality of second terminals; and a plurality of second circuit modules, where the plurality of second circuit modules are one-to-one electrically connected to the plurality of second terminals, and the second circuit modules each are configured to output a data signal based on the drive signal.

Type: Grant

Filed: September 14, 2021

Date of Patent: May 13, 2025

Assignee: TCL China Star Optoelectronics Technology Co., Ltd.

Inventor: Jinfeng Liu
Systolic neural CPU processor

Patent number: 12242416

Abstract: A systolic neural CPU (SNCPU) including a two-dimensional systolic array of reconfigurable processing elements (PE's) fuses a conventional CPU with a convolutional neural network (CNN) accelerator in four phases of operation: row-CPU, column-accelerator, column-CPU, and row-accelerator. The SNCPU cycles through the four phases to avoid costly data movement across cores, reduce overhead, and reduce latency. The PE's communicate bidirectionally with neighboring PE's and memory units at an outer edge of the array. A row of PE's is configurable into a first deep neural network (DNN) accumulator at a first time and configurable into a first CPU pipeline at a second time. A column of PE's is configurable into a second DNN accumulator at a third time and configurable into a second CPU pipeline at a fourth time.

Type: Grant

Filed: December 23, 2022

Date of Patent: March 4, 2025

Assignee: Northwestern University

Inventors: Jie Gu, Yuhao Ju
Highly parallel convolutional neural network

Patent number: 12242951

Abstract: A CNN inference engine that convolves an input data set with a weight data set is disclosed together with components that facilitate such computation. The engine includes a plurality of multiply and accumulate processors (MACs), each MAC causing a value in the accumulator to be augmented by a product of a data value received on an input data port, a weight value received on a weight port. The engine also includes a slice buffer having a plurality of output ports, each output port being connected to one of the MAC input data value ports. The engine causes the slice buffer to connect one of the slices to the plurality of slice buffer output ports, and causes a weight received on an inference engine weight port to be input to each MAC weight port. The MACs process the input data values on the output ports in the slice in parallel.

Type: Grant

Filed: June 15, 2021

Date of Patent: March 4, 2025

Assignee: Ocean Logic Pty Ltd

Inventor: Vincenzo Liguori
Information estimation apparatus and information estimation method

Patent number: 12136032

Abstract: A technique for stable and fast computation of a variance representing a confidence interval for an estimation result in an estimation apparatus using a neural network including an integrated layer that combines a dropout layer for dropping out part of input data and an FC layer for computing a weight is provided. When input data having a multivariate distribution is supplied to the integrated layer, a data analysis unit 30 determines, based on a numerical distribution of terms formed by respective products of each vector element of the input data and the weight, a data type of each vector element of output data from the integrated layer. An estimated confidence interval computation unit 20 applies an approximate computation method associated with the data type, to analytically compute a variance of each vector element of the output data from the integrated layer based on the input data to the integrated layer.

Type: Grant

Filed: November 14, 2017

Date of Patent: November 5, 2024

Assignee: DENSO IT LABORATORY, INC.

Inventor: Jingo Adachi
Multi-precision digital compute-in-memory deep neural network engine for flexible and energy efficient inferencing

Patent number: 12079733

Abstract: Anon-volatile memory structure capable of storing weights for layers of a deep neural network (DNN) and perform an inferencing operation within the structure is presented. An in-array multiplication can be performed between multi-bit valued inputs, or activations, for a layer of the DNN and multi-bit valued weights of the layer. Each bit of a weight value is stored in a binary valued memory cell of the memory array and each bit of the input is applied as a binary input to a word line of the array for the multiplication of the input with the weight. To perform a multiply and accumulate operation, the results of the multiplications are accumulated by adders connected to sense amplifiers along the bit lines of the array. The adders can be configured to multiple levels of precision, so that the same structure can accommodate weights and activations of 8-bit, 4-bit, and 2-bit precision.

Type: Grant

Filed: July 28, 2020

Date of Patent: September 3, 2024

Assignee: SanDisk Technologies LLC

Inventors: Tung Thanh Hoang, Won Ho Choi, Martin Lueker-Boden
Multi-mode architecture for unifying matrix multiplication, 1×1 convolution and 3×3 convolution

Patent number: 12008069

Abstract: In a system with control logic and a processing element array, two modes of operation may be provided. In the first mode of operation, the control logic may configure the system to perform matrix multiplication or 1×1 convolution. In the second mode of operation, the control logic may configure the system to perform 3×3 convolution. The processing element array may include an array of processing elements. Each of the processing elements may be configured to compute the dot product of two vectors in a single clock cycle, and further may accumulate the dot products that are sequentially computed over time.

Type: Grant

Filed: November 29, 2023

Date of Patent: June 11, 2024

Assignee: Recogni Inc.

Inventors: Jian hui Huang, Gary S. Goldman
Neural network data computation using mixed-precision

Patent number: 12001953

Abstract: Techniques for mixed-precision data manipulation for neural network data computation are disclosed. A first left group comprising eight bytes of data and a first right group of eight bytes of data are obtained for computation using a processor. A second left group comprising eight bytes of data and a second right group of eight bytes of data are obtained. A sum of products is performed between the first left and right groups and the second left and right groups. The sum of products is performed on bytes of 8-bit integer data. A first result is based on a summation of eight values that are products of the first group's left eight bytes and the second group's left eight bytes. A second result is based on the summation of eight values that are products of the first group's left eight bytes and the second group's right eight bytes. Results are output.

Type: Grant

Filed: February 24, 2023

Date of Patent: June 4, 2024

Assignee: MIPS Tech, LLC

Inventors: James Hippisley Robinson, Sanjay Patel
Product-sum arithmetic device, product-sum arithmetic circuit, and product-sum arithmetic method

Patent number: 11947929

Abstract: An arithmetic device includes a comparison unit comparing voltage generated with charge stored in a storage unit with a threshold, and outputting an output signal at a timing when the voltage exceeds the threshold, and a timing extension unit extending an interval between timings at each of which the output signal is output.

Type: Grant

Filed: July 4, 2019

Date of Patent: April 2, 2024

Assignee: SONY CORPORATION

Inventor: Hiroyuki Yamagishi
Multiply-accumulate device and multiply-accumulate method

Patent number: 11900184

Abstract: A multiply-accumulate device (10) includes: a comparison unit (18) that compares, with a threshold voltage, a voltage generated by an electric charge stored in a storage unit (14), and outputs an output signal at timing at which the voltage exceeds the threshold voltage; and a control circuit (110) that reduces, based on a predetermined set value, a charging current to the storage unit (14) from a plurality of input units (13) connected to the storage unit (14).

Type: Grant

Filed: July 5, 2019

Date of Patent: February 13, 2024

Assignee: Sony Group Corporation

Inventors: Yasushi Fujinami, Hiroyuki Yamagishi
Apparatus and method using neural network

Patent number: 11625224

Abstract: An apparatus includes a first holding unit and a second holding unit configured to hold first-type data and second-type data, respectively, a first operation unit configured to execute a first product-sum operation based on the first-type data, a branch unit configured to output an operation result of the first product-sum operation in parallel, a sampling unit configured to sample the operation result and to output a sampling result, and a second operation unit configured to execute a second product-sum operation based on the second-type data and the sampling result.

Type: Grant

Filed: April 17, 2019

Date of Patent: April 11, 2023

Assignee: CANON KABUSHIKI KAISHA

Inventors: Tsewei Chen, Masami Kato, Masahiro Ariizumi
Circuit

Patent number: 11614919

Abstract: A circuit, comprising a first term operation circuit and a second term operation circuit, a third term operation circuit, and a second calculation circuit. Each of the first and the second term operation circuits comprises multiple higher bit operation circuits, a lowest bit operation circuit, and a first calculation circuit. Each of the higher bit operation circuits selectively left-shifts a multiplicand by different bits, outputs the shifted multiplicand, determines a sign of the shifted multiplicand, and left-shifts the shifted multiplicand. The lowest bit operation circuit outputs the multiplicand, and determines a sign of the multiplicand. The first calculation circuit generates a term operation result. The third term operation circuit generates a third term operation result. The second calculation circuit adds the term operation result of the first and second term operation circuits and the third term operation result to generate a total operation result.

Type: Grant

Filed: August 13, 2020

Date of Patent: March 28, 2023

Assignee: REALTEK SEMICONDUCTOR CORPORATION

Inventor: Szu-Chun Chang
Neural network data computation using mixed-precision

Patent number: 11615307

Abstract: Techniques for mixed-precision data manipulation for neural network data computation are disclosed. A first left group comprising eight bytes of data and a first right group of eight bytes of data are obtained for computation using a processor. A second left group comprising eight bytes of data and a second right group of eight bytes of data are obtained. A sum of products is performed between the first left and right groups and the second left and right groups. The sum of products is performed on bytes of 8-bit integer data. A first result is based on a summation of eight values that are products of the first group's left eight bytes and the second group's left eight bytes. A second result is based on the summation of eight values that are products of the first group's left eight bytes and the second group's right eight bytes. Results are output.

Type: Grant

Filed: August 5, 2020

Date of Patent: March 28, 2023

Assignee: MIPS Tech, LLC

Inventors: James Hippisley Robinson, Sanjay Patel
Processing-in-memory (PIM) system including multiplying-and-accumulating (MAC) circuit

Patent number: 11500629

Abstract: A multiplying-and-accumulating (MAC) circuit includes a multiplying circuit and an adding circuit. The multiplying circuit includes a first multiplier and a second multiplier, and each of the first multiplier and the second multiplier performs a multiplying calculation for first input data with N bits and second input data with M bits to output multiplication result data with (N+M) bits (where, “N” and “M” are natural numbers which are equal to or greater than one). The adding circuit includes an adder which performs an adding calculation for the multiplication result data of the first multiplier and the multiplication result data of the second multiplier to output addition result data with (N+M) bits.

Type: Grant

Filed: January 8, 2021

Date of Patent: November 15, 2022

Assignee: SK hynix Inc.

Inventor: Choung Ki Song
Systems, apparatuses, and methods for chained fused multiply add

Patent number: 11487541

Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand.

Type: Grant

Filed: November 30, 2020

Date of Patent: November 1, 2022

Assignee: Intel Corporation

Inventors: Jesus Corbal, Robert Valentine, Roman S. Dubtsov, Nikita A. Shustrov, Mark J. Charney, Dennis R. Bradford, Milind B. Girkar, Edward T. Grochowski, Thomas D. Fletcher, Warren E. Ferguson
Using a low-bit-width dot product engine to sum high-bit-width numbers

Patent number: 11455143

Abstract: A device (e.g., an integrated circuit chip) includes a dot product processing component, a data alignment component, and an accumulator. The dot product processing component is configured to calculate a dot product of a first group of elements stored in a first storage unit with a second group of elements, wherein: each element of the first group of elements is represented using a first number of bits, each value of a group of values stored in the first storage unit is represented using a second number of bits greater than the first number of bits, and each value of the group of values is stored as split segments across more than one element of the elements of the first group of elements. The data alignment component is configured to receive results of the dot product processing component and modify one or more of the results of the dot product processing component. The accumulator is configured to sum outputs of the data alignment component to at least in part determine a sum of the group of values.

Type: Grant

Filed: May 7, 2020

Date of Patent: September 27, 2022

Assignee: Meta Platforms, Inc.

Inventors: Thomas Mark Ulrich, Krishnakumar Narayanan Nair, Ehsan Khish Ardestani Zadeh
Ultra-low precision floating-point fused multiply-accumulate unit

Patent number: 11455142

Abstract: Embodiments for implementing a fused multiply-multiply-accumulate (“FMMA”) unit by one or more processors in a computing system. Mantissas for two products, an exponent difference of the two products serving as an alignment shift amount for a product of the two products having a smallest exponent, and an alignment shift amount for an addend relative to an alternative product of the two product having a larger exponent may be determined in parallel. The addend may be aligned relative to the alternative product having the larger exponent. The product having the smallest exponent may be aligned relative to the alternative product having the larger exponent according to the alignment shift amount.

Type: Grant

Filed: June 5, 2019

Date of Patent: September 27, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ankur Agrawal, Silvia Mueller, Kailash Gopalakrishnan, Bruce Fleischer, Balaram Sinharoy, Mingu Kang
Systems and methods for performing 16-bit floating-point vector dot product instructions

Patent number: 11366663

Abstract: Disclosed embodiments relate to systems and methods for performing 16-bit floating-point vector dot product instructions. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first source, second source, and destination vectors, the opcode to indicate execution circuitry is to multiply N pairs of 16-bit floating-point formatted elements of the specified first and second sources, and accumulate the resulting products with previous contents of a corresponding single-precision element of the specified destination, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.

Type: Grant

Filed: November 9, 2018

Date of Patent: June 21, 2022

Assignee: Intel Corporation

Inventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Raanan Sade, Menachem Adelman, Zeev Sperber, Amit Gradstein, Simon Rubanovich
Computation circuit including a plurality of processing elements coupled to a common accumulator, a computation device and a system including the same

Patent number: 11262982

Abstract: A computation circuit includes a plurality of processing elements and a common accumulator. The plurality of processing elements are sequentially coupled in series, and performs a multiply and accumulate (MAC) operation on a weight signal and at least one of two or more input signals received in each unit cycle. The common accumulator is sequentially and cyclically coupled to first to Kth processing elements among the plurality of processing elements, and configured to receive a computation value outputted from a processing element coupled thereto among the first to Kth processing elements, and store computation information. The K is decided based on values of the two or more input signals and the number of guard bits included in one processing element.

Type: Grant

Filed: July 22, 2019

Date of Patent: March 1, 2022

Assignees: SK hynix Inc., SK Telecom Co., Ltd.

Inventors: Yong Sang Park, Seok Joong Hwang
Computation device having a multiplexer and several multipliers and computation system

Patent number: 11188305

Abstract: A computation device includes: a data multiplexer configured to output first high-order data as first output data and fifth output data, output first low-order data as third output data and seventh output data, output second high-order data as second output data, output second low-order data as fourth output data, output third high-order data, which is high-order data having a second bit number out of third input data, as sixth output data, and output third low-order data, which is low-order data having the second bit number out of the third input data, as eighth output data when a mode signal indicates a second computation mode; and first to fourth multipliers each of which multiplies two output data.

Type: Grant

Filed: May 11, 2018

Date of Patent: November 30, 2021

Assignees: Preferred Networks, Inc., Riken

Inventors: Junichiro Makino, Takayuki Muranushi, Miyuki Tsubouchi, Ken Namura
Solving multivariate quadratic problems using digital or quantum annealing

Patent number: 11163532

Abstract: A method may include obtaining a set of multivariate quadratic polynomials associated with a multivariate quadratic problem and generating an Ising Model connection weight matrix “W” and an Ising Model bias vector “b” based on the multivariate quadratic polynomials. The method may also include providing the matrix “W” and the vector “b” to an annealing system configured to solve problems written according to the Ising Model and obtaining an output from the annealing system that represents a set of integers. The method may also include using the set of integers as a solution to the multivariate quadratic problem.

Type: Grant

Filed: January 18, 2019

Date of Patent: November 2, 2021

Assignee: FUJITSU LIMITED

Inventors: Hart Montgomery, Arnab Roy, Ryuichi Ohori, Toshiya Shimizu, Takeshi Shimoyama, Jumpei Yamaguchi
Transcendental calculation unit apparatus and method

Patent number: 10983755

Abstract: A transcendental calculation unit includes a configuration table storing a set of constants and provide a selected one of the constants, a power series multiplier that iteratively develops a power series, a coefficient series multiplier and accumulator that develops an accumulated product of the power series and the constant, and a round and normalize stage that rounds the accumulated product and normalizes rounded product.

Type: Grant

Filed: July 22, 2020

Date of Patent: April 20, 2021

Inventor: Mitchell K. Alsup
Semiconductor device for performing sum-of-product computation and operating method thereof

Patent number: 10970044

Abstract: A semiconductor device for performing a sum-of-product computation and an operating method thereof are provided. The semiconductor device includes an inputting circuit, a scaling circuit, a computing memory and an outputting circuit. The inputting circuit is used for receiving a plurality of inputting signals. The inputting signals are voltages or currents. The scaling circuit is connected to the inputting circuit for transforming the inputting signals to be a plurality of compensated signals respectively. The compensated signals are voltages or currents. The computing memory is connected to the scaling circuit. The computing memory includes a plurality of computing cells and the compensated signals are applied to the computing cells respectively. The outputting circuit is connected to the computing memory for reading an outputting signals of the computing cells. The outputting signal is voltage or current.

Type: Grant

Filed: May 9, 2019

Date of Patent: April 6, 2021

Assignee: MACRONIX INTERNATIONAL CO., LTD.

Inventors: Ming-Hsiu Lee, Chao-Hung Wang
Apparatus and method to switch configurable logic units

Patent number: 10963265

Abstract: Examples described herein include systems and methods which include an apparatus comprising a plurality of configurable logic units and a plurality of switches, with each switch being coupled to at least one configurable logic unit of the plurality of configurable logic units. The apparatus further includes an instruction register configured to provide respective switch instructions of a plurality of switch instructions to each switch based on a computation to be implemented among the plurality of configurable logic units. For example, the switch instructions may include allocating the plurality of configurable logic units to perform the computation and activating an input of the switch and an output of the switch to couple at least a first configurable logic unit and a second configurable logic unit. In various embodiments, configurable logic units can include arithmetic logic units (ALUs), bit manipulation units (BMUs), and multiplier-accumulator units (MACs).

Type: Grant

Filed: April 21, 2017

Date of Patent: March 30, 2021

Assignee: Micron Technology, Inc.

Inventors: Fa-Long Luo, Tamara Schmitz, Jeremy Chritz, Jaime Cummins
Operation processing apparatus, information processing apparatus and information processing method

Patent number: 10936939

Abstract: An operation processing apparatus includes a memory and a processor coupled to the memory. The processor executes an operation according to an operation instruction, acquires statistical information for a distribution of bits in fixed point data after an execution of an operation for the fixed point data according to an acquisition instruction, and outputs the statistical information to a register designated by the acquisition instruction.

Type: Grant

Filed: February 14, 2019

Date of Patent: March 2, 2021

Assignee: FUJITSU LIMITED

Inventors: Mitsuru Tomono, Makiko Ito
Execution unit accelerator

Patent number: 10929134

Abstract: A processor to facilitate acceleration of instruction execution is disclosed. The processor includes a plurality of execution units (EUs), each including an instruction decode unit to decode an instruction into one or more operands and opcode defining an operation to be performed at an accelerator, a register file having a plurality of registers to store the one or more operands and an accelerator having programmable hardware to retrieve the one or more operands from the register file and perform the operation on the one or more operands.

Type: Grant

Filed: June 28, 2019

Date of Patent: February 23, 2021

Assignee: Intel Corporation

Inventors: Radhakrishna Sripada, Peter Yiannacouras, Josh Triplett, Nagabhushan Chitlur, Kalyan Kondapally
Apparatus and method for performing dual signed and unsigned multiplication of packed data elements

Patent number: 10802826

Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements.

Type: Grant

Filed: September 29, 2017

Date of Patent: October 13, 2020

Assignee: Intel Corporation

Inventors: Venkateswara Madduri, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Mark Charney, Robert Valentine, Binwei Yang
Providing efficient floating-point operations using matrix processors in processor-based systems

Patent number: 10747501

Abstract: Providing efficient floating-point operations using matrix processors in processor-based systems is disclosed. In this regard, a matrix-processor-based device provides a matrix processor comprising a positive partial sum accumulator and a negative partial sum accumulator. As the matrix processor processes pairs of floating-point operands, the matrix processor calculates an intermediate product based on a first floating-point operand and a second floating-point operand and determines a sign of the intermediate product. Based on the sign, the matrix processor normalizes the intermediate product with a partial sum fraction of the positive partial sum accumulator or the negative partial sum accumulator, then adds the intermediate product to the positive sum accumulator or the negative sum accumulator.

Type: Grant

Filed: August 30, 2018

Date of Patent: August 18, 2020

Assignee: Qualcomm Incorporated

Inventors: Mattheus Cornelis Antonius Adrianus Heddes, Natarajan Vaidhyanathan, Robert Dreyer, Colin Beaton Verrilli, Koustav Bhattacharya
Sum-of-products accelerator array

Patent number: 10719296

Abstract: A device for generating sum-of-products data includes an array of variable resistance cells, variable resistance cells in the array each comprising a programmable threshold transistor and a resistor connected in parallel, the array including n columns of cells including strings of series-connected cells and m rows of cells. Control and bias circuitry are coupled to the array, including logic for programming the programmable threshold transistors in the array with thresholds corresponding to values of a weight factor Wmn for the corresponding cell. Input drivers are coupled to corresponding ones of the m rows of cells, the input drivers selectively applying inputs Xm to rows m. Column drivers are configured to apply currents In to corresponding ones of the n columns of cells. Voltage sensing circuits operatively coupled to the columns of cells.

Type: Grant

Filed: January 17, 2018

Date of Patent: July 21, 2020

Assignee: MACRONIX INTERNATIONAL CO., LTD.

Inventors: Feng-Min Lee, Yu-Yu Lin
Apparatus and method for multiplying, summing, and accumulating sets of packed bytes

Patent number: 10705839

Abstract: A processor having a decoder to decode an instruction to generate a decoded instruction; a first source register to store a first plurality of packed signed bytes; a second source register to store a second plurality of packed signed bytes; execution circuitry to execute the decoded instruction, the execution circuitry including: multiplier circuitry to multiply each packed signed byte from the first source register with a corresponding packed signed byte from the second source register to generate temporary products, adder circuitry to add a plurality of sets of the temporary products to generate a plurality of temporary sums; negation and extension circuitry to negate and extend each of the temporary sums to doublewords sums; and accumulation circuitry to add each of the doublewords sums to a doubleword from a third source register to generate final doubleword results; and a packed data destination register to store the final doubleword results.

Type: Grant

Filed: December 21, 2017

Date of Patent: July 7, 2020

Assignee: Intel Corporation

Inventors: Venkateswara Madduri, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Mark Charney, Jesus Corbal
Apparatus and method for multiplication and accumulation of complex and real packed data elements

Patent number: 10552154

Abstract: An apparatus and method for multiplying packed real and imaginary components of complex numbers. A method comprises: multiplying selected imaginary and real data elements in a first and second source registers to generate a plurality of imaginary products; adding a first subset of the plurality of imaginary products to generate a first temporary result and adding a second subset of the plurality of imaginary products to generate a second temporary result; negating the first temporary result to generate a third temporary result and the second temporary result to generate a fourth temporary result; accumulating the third temporary result with first data to generate a first final result and accumulating the fourth temporary result with second data to generate a second final result; and storing the first final result and second final.

Type: Grant

Filed: September 29, 2017

Date of Patent: February 4, 2020

Assignee: Intel Corporation

Inventors: Venkateswara Madduri, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Mark Charney, Robert Valentine, Binwei Yang
Sparsity-aware hardware accelerators

Patent number: 10482156

Abstract: A special-purpose, hardware-based accelerator may include an input subsystem configured to receive first and second vectors as operands of a full dot-product operation. The accelerator may also include a sparsity-aware dot-product engine communicatively coupled to the input subsystem and configured to perform adaptive dot-product processing by: (1) identifying, within the first and second vectors, at least one zero-value element and (2) executing, in response to identifying the zero-value element, a reduced dot-product operation that excludes, relative to the full dot-product operation, at least one mathematical operation in which the zero-value element is an operand. The accelerator may also include an output subsystem that is communicatively coupled to the sparsity-aware dot-product engine and configured to send a result of the reduced dot-product operation to a storage subsystem. Various other accelerators, computing systems, and methods are also disclosed.

Type: Grant

Filed: December 29, 2017

Date of Patent: November 19, 2019

Assignee: Facebook, Inc.

Inventors: Abdulkadir Utku Diril, Jong Soo Park, Nadav Rotem, Mikhail Smelyanskiy
Performing concurrent operations in a processing element

Patent number: 10459876

Abstract: A processing element (PE) of a systolic array can perform neural networks computations in parallel on two or more sequential data elements of an input data set using the same weight. Thus, two or more output data elements corresponding to an output data set may be generated in parallel. Based on the size of the input data set and an input data type, the systolic array can process a single data element or multiple data elements in parallel.

Type: Grant

Filed: January 31, 2018

Date of Patent: October 29, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Dana Michelle Vantrease, Ron Diamant
Memcapacitive cross-bar array for determining a dot product

Patent number: 10249356

Abstract: A method of obtaining a dot product includes applying a programming signal to a number of capacitive memory devices coupled at a number of junctions formed between a number of row lines and a number of column lines. The programming signal defines a number of values within a matrix. The method further includes applying a vector signal. The vector signal defines a number of vector values to be applied to the capacitive memory devices.

Type: Grant

Filed: October 28, 2014

Date of Patent: April 2, 2019

Assignee: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP

Inventors: Ning Ge, John Paul Strachan, Jianhua Yang, Miao Hu
Apparatus employing user-specified binary point fixed point arithmetic

Patent number: 10228911

Abstract: An apparatus includes a plurality of arithmetic logic units each having an accumulator and an integer arithmetic unit that receives and performs integer arithmetic operations on integer inputs and accumulates integer results of a series of the integer arithmetic operations into the accumulator as an integer accumulated value. A register is programmable with an indication of a number of fractional bits of the integer accumulated values and an indication of a number of fractional bits of integer outputs. A first bit width of the accumulator is greater than twice a second bit width of the integer outputs. A plurality of adjustment units scale and saturate the first bit width integer accumulated values to generate the second bit width integer outputs based on the indications of the number of fractional bits of the integer accumulated values and outputs programmed into the register.

Type: Grant

Filed: April 5, 2016

Date of Patent: March 12, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: G. Glenn Henry, Terry Parks
Matrix circuits

Patent number: 10055383

Abstract: A circuit is provided. In an example, the circuit includes a memory array that includes a plurality of memory cells to store a matrix and a plurality of data lines coupled to the plurality of memory cells to provide a first set of values of the matrix. The circuit includes a multiplier coupled to the plurality of data lines to multiply the first set of values by a second set of values to produce a third set of values. A summing unit is included that is coupled to the multiplier to sum the third set of values to produce a sum. The circuit includes a shifting unit coupled to the summing unit to shift the sum and to add the shifted sum to a running total.

Type: Grant

Filed: April 28, 2017

Date of Patent: August 21, 2018

Assignee: Hewlett Packard Enterprise Development LP

Inventors: Ali Shafiee Ardestani, Naveen Muralimanohar
Information processing device, information processing method, and information processing program for data compression

Patent number: 9893742

Abstract: An information processing method for a computer for data compression, the method includes: performing projective transformation of a first numeric string corresponding to an input signal into a second numeric string which contains more components than the first numeric string and having a sum of squares of components as a predetermined value by using a plurality of projective parameters; and generating a bit string in which bits indicating positive and negative signs of the respective components of an operation result obtained by a vector product operation of the second numeric string obtained by the projective transformation and an observation matrix are arranged.

Type: Grant

Filed: September 12, 2017

Date of Patent: February 13, 2018

Assignee: FUJITSU LIMITED

Inventor: Yui Noma
Unified multiply unit

Patent number: 9710228

Abstract: Embodiments disclosed pertain to apparatuses, systems, and methods for performing multi-precision single instruction multiple data (SIMD) operations on integer, fixed point and floating point operands. Disclosed embodiments pertain to a circuit that is capable of performing concurrent multiply, fused multiply-add, rounding, saturation, and dot products on the above operand types. In addition, the circuit may facilitate 64-bit multiplication when Newton-Raphson, divide and square root operations are performed.

Type: Grant

Filed: December 29, 2014

Date of Patent: July 18, 2017

Assignee: Imagination Technologies Limited

Inventor: Leonard Rarick
Integrated circuit device and methods of performing bit manipulation therefor

Patent number: 9639362

Abstract: An integrated circuit device comprising at least one instruction processing module arranged to receive a bit-manipulation instruction, and in response to receiving the bit-manipulation instruction to select at least one bit from at least one source data register in accordance with a value of at least one control bit, select from candidate values a manipulation value for the at least one selected bit in accordance with a value of at least one further control bit, and store the selected manipulation value for the at least one selected bit in at least one output data register.

Type: Grant

Filed: March 30, 2011

Date of Patent: May 2, 2017

Assignee: NXP USA, INC.

Inventors: Noam Eshel-Goldman, Aviram Amir, Itzhak Barak, Amir Kleen
Memory controllers

Patent number: 9513912

Abstract: Methods and controllers for executing an instruction set are provided. In one such method, executing an instruction set includes executing an instruction of one type in the instruction set, executing a context switch instruction, and executing an instruction of a second type in the instruction set. in one such controller, a single machine executes instructions in an instruction set with instructions having an operational code, and instructions that do not have an operational code.

Type: Grant

Filed: July 27, 2012

Date of Patent: December 6, 2016

Assignee: Micron Technology, Inc.

Inventors: Luca De Santis, Maria-Luisa Gallese, Emanuele Sirizotti, Walter Di-Francesco
Signed multiplier circuit utilizing a uniform array of logic blocks

Patent number: 9411554

Abstract: A signed multiplier circuit includes a two-dimensional array of substantially similar logic blocks. Each of the logic blocks is programmable to implement any of four multiply functions of first and second inputs, in which: the first and second inputs are both signed; the first and second inputs are both unsigned; the first input is signed and the second input is unsigned; and the first input is unsigned and the second input is signed. Each logic block includes rows and columns of sub-circuits, e.g., logical AND gates and full adders. One row and one column of each logic block include a programmably invertible AND gate, with the row and column being independently controlled. The ability to program the logic block to perform all four of these functions enables the combination of rows and columns of the logic blocks to build large signed multipliers of virtually any size.

Type: Grant

Filed: April 2, 2009

Date of Patent: August 9, 2016

Assignee: XILINX, INC.

Inventors: Steven P. Young, Brian C. Gaide
Content addressable memory

Patent number: 9280633

Abstract: A method of designing a content-addressable memory (CAM) includes associating CAM cells with a summary circuit. The summary circuit includes a first level of logic gates and a second level of logic gates. The first level of logic gates have inputs each configured to receive an output of a corresponding one of the plurality of CAM cell. The second level of logic gates have inputs each configured to receive an output of a corresponding one of the first level of logic gates. Logic gates in at least one of the first level of logic gates or the second level of logic gates are selected to have an odd number of input pins so that an input pin and an output pin share a layout sub-slot.

Type: Grant

Filed: May 16, 2014

Date of Patent: March 8, 2016

Assignee: TAIWAN SEMICONDUCTOR MANUFACTURING COMPANY, LTD.

Inventors: Young Seog Kim, Kuoyuan Hsu, Jacklyn Chang
Polynomial calculations optimized for programmable integrated circuit device structures

Patent number: 9207909

Abstract: Polynomial circuitry includes a respective partial product generator for each bit position of each term of a plurality of terms of a polynomial to be evaluated. A respective plurality of adders for each bit position adds partial products of a respective bit position across all of the plurality of terms to provide a respective bit-slice sum. Resulting bit-slice sums are offset from one another according to their respective bit positions. A final adder adds together the respective offset bit-slice sums to provide a result.

Type: Grant

Filed: March 8, 2013

Date of Patent: December 8, 2015

Assignee: Altera Corporation

Inventor: Martin Langhammer
Flexible accumulator in digital signal processing circuitry

Patent number: 9170775

Abstract: A multiplier-accumulator (MAC) block can be programmed to operate in one or more modes. When the MAC block implements at least one multiply-and-accumulate operation, the accumulator value can be zeroed without introducing clock latency or initialized in one clock cycle. To zero the accumulator value, the most significant bits (MSBs) of data representing zero can be input to the MAC block and sent directly to the add-subtract-accumulate unit. Alternatively, dedicated configuration bits can be set to clear the contents of a pipeline register for input to the add-subtract-accumulate unit.

Type: Grant

Filed: January 7, 2010

Date of Patent: October 27, 2015

Assignee: Altera Corporation

Inventors: Leon Zheng, Martin Langhammer, Nitin Prasad, Greg Starr, Chiao Kai Hwang, Kumara Tharmalingam
Exponentiation system

Patent number: 8930435

Abstract: A method for computation, including defining a sequence of n bits that encodes an exponent d, such that no more than a specified number of successive bits in the sequence are the same, initializing first and second registers using a value of a base x that is to be exponentiated, whereby the first and second registers hold respective first and second values, which are successively updated during the computation, successively, for each bit in the sequence computing a product of the first and second values, depending on whether the bit is one or zero, selecting one of the first and second registers, and storing the product in the selected one of the registers, whereby the first and second registers hold respective first and second final values upon completion of the sequence, and returning xd based on the first and second final values. Related apparatus and methods are also described.

Type: Grant

Filed: September 21, 2010

Date of Patent: January 6, 2015

Assignee: Cisco Technology Inc.

Inventors: Yaacov Belenky, Zeev Geyzel
Digital signal processing circuitry with redundancy and bidirectional data paths

Patent number: 8805916

Abstract: Digital signal processing (“DSP”) circuit blocks are provided that can more easily work together to perform larger (e.g., more complex and/or more arithmetically precise) DSP operations if desired. These DSP blocks may also include redundancy circuitry that facilitates stitching together multiple such blocks despite an inability to use some block (e.g., because of a circuit defect).

Type: Grant

Filed: March 3, 2009

Date of Patent: August 12, 2014

Assignee: Altera Corporation

Inventors: Martin Langhammer, Yi-Wen Lin, Keone Streicher
Montgomery multiplication circuit

Patent number: 8793300

Abstract: A circuit for calculating a sum of products, each product having a q-bit binary operand and a k-bit binary operand, where k is a multiple of q, includes a q-input carry-save adder (CSA); a multiplexer (10) by input of the adder, having four k-bit channels respectively receiving the value 0, a first (Yi) of the k-bit operands, the second k-bit operand (M[63:0], mi), and the sum of the two k-bit operands, the output of a multiplexer of rank t (where t is between 0 and q?1) being taken into account by the adder with a t-bit left shift; and each multiplexer having first and second path selection inputs, the bits of a first of the q-bit operands being respectively supplied to the first selection inputs, and the bits of the second q-bit operand being respectively supplied to the second selection inputs.

Type: Grant

Filed: April 11, 2012

Date of Patent: July 29, 2014

Assignee: INSIDE Secure

Inventor: Michael Niel
System and method for implementing elliptic curve scalar multiplication in cryptography

Patent number: 8649508

Abstract: A system and method for implementing the Elliptic Curve scalar multiplication method in cryptography, where the Double Base Number System is expressed in decreasing order of exponents and further on using it to determine Elliptic curve scalar multiplication over a finite elliptic curve.

Type: Grant

Filed: September 29, 2008

Date of Patent: February 11, 2014

Assignee: Tata Consultancy Services Ltd.

Inventor: Natarajan Vijayarangan
Multiplier-accumulator circuitry and methods

Patent number: 8645450

Abstract: Multiplier-accumulator circuitry includes circuitry for forming a plurality of partial products of multiplier and multiplicand inputs, carry-save adder circuitry for adding together the partial products and another input to produce intermediate sum and carry outputs, final adder circuitry for adding together the intermediate sum and carry outputs to produce a final output, and feedback circuitry for applying the final output (typically after some delay, e.g., due to registration of the final output) to the carry-save adder circuitry as said another input. The above circuitry may be implemented in so-called “hard IP” (intellectual property) of a field-programmable gate array (“FPGA”) integrated circuit device. If desired, any overflow from the accumulation performed by the above circuitry may be accumulated in “soft” accumulator-overflow circuitry that is implemented in the general-purpose programmable logic of the FPGA.

Type: Grant

Filed: March 2, 2007

Date of Patent: February 4, 2014

Assignee: Altera Corporation

Inventors: Kok Heng Choe, Tony K Ngai, Henry Y. Lui
Specialized processing block for programmable integrated circuit device

Patent number: 8543634

Abstract: A specialized processing block such as a DSP block may be enhanced by including direct connections that allow the block output to be directly connected to either the multiplier inputs or the adder inputs of another such block. A programmable integrated circuit device may includes a plurality of such specialized processing blocks. The specialized processing block includes a multiplier having two multiplicand inputs and a product output, an adder having as one adder input the product output of the multiplier, and having a second adder input and an adder output, a direct-connect output of the adder output to a first other one of the specialized processing block, and a direct-connect input from a second other one of the specialized processing block. The direct-connect input connects a direct-connect output of that second other one of the specialized processing block to a first one of the multiplicand inputs.

Type: Grant

Filed: March 30, 2012

Date of Patent: September 24, 2013

Assignee: Altera Corporation

Inventors: Lei Xu, Volker Mauer, Steven Perry

1 2 3 4 next