Patents Examined by Andrew Caldwell
  • Patent number: 10885433
    Abstract: A neural network apparatus configured to perform a deconvolution operation includes a memory configured to store a first kernel; and a processor configured to: obtain, from the memory, the first kernel; calculate a second kernel by adjusting an arrangement of matrix elements comprised in the first kernel; generate sub-kernels by dividing the second kernel; perform a convolution operation between an input feature map and the sub-kernels using a convolution operator; and generate an output feature map, as a deconvolution of the input feature map, by merging results of the convolution operation.
    Type: Grant
    Filed: August 21, 2018
    Date of Patent: January 5, 2021
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Joonho Song, Sehwan Lee, Junwoo Jang
  • Patent number: 10878310
    Abstract: Described embodiments include a system that includes one or more buffers and circuitry. The circuitry is configured to process a plurality of input values, by identifying each of the input values that is not zero-valued, and, for each value of the identified input values, computing respective products of coefficients of a kernel with the value and storing at least some of the respective products in the buffers. The circuitry is further configured to compute a plurality of output values, by retrieving respective sets of stored values from the buffers, at least some of the retrieved sets including one or more of the products, and summing the retrieved sets. The circuitry is further configured to output the computed output values. Other embodiments are also described.
    Type: Grant
    Filed: November 15, 2017
    Date of Patent: December 29, 2020
    Assignee: MELLANOX TECHNOLOGIES, LTD.
    Inventors: Dotan Levi, Tal Anker, Ohad Markus
  • Patent number: 10877732
    Abstract: A binary logic circuit for determining y=x mod(2m?1), where x is an n-bit integer, y is an m-bit integer, and n>m, includes reduction logic configured to reduce x to a sum of a first m-bit integer ? and a second m-bit integer ?; and addition logic configured to calculate an addition output represented by the m least significant bits of the following sum right-shifted by m: a first binary value of length 2m, the m most significant bits and the m least significant bits each being the string of bit values represented by ?; a second binary value of length 2m, the m most significant bits and the m least significant bits each being the string of bit values represented by ?; and the binary value 1.
    Type: Grant
    Filed: May 13, 2020
    Date of Patent: December 29, 2020
    Assignee: Imagination Technologies Limited
    Inventor: Thomas Rose
  • Patent number: 10879877
    Abstract: Provided herein is an implementation of a finite impulse response (FIR) filter that uses a distributed arithmetic architecture. In one or more example, a data sample with multiple bits is processed through a plurality of bit-level multiply and accumulate circuits, wherein each bit of the data sample corresponds to a bit of the data sample. The output of each bit-level multiply and accumulate circuit can then be shifted by an appropriate amount based on the bit placement of the bit of the data sample that corresponds to the bit-level multiply and accumulate circuit. After each output is shifted by the appropriate amount, the outputs can be aggregated to form a final FIR filter result.
    Type: Grant
    Filed: September 28, 2018
    Date of Patent: December 29, 2020
    Assignee: The MITRE Corporation
    Inventor: Rishi Yadav
  • Patent number: 10867008
    Abstract: Embodiments of the present invention provide a hierarchical, multi-layer Jacobi method for implementing a dense symmetric eigenvalue solver using multiple processors. Each layer of the hierarchical method is configured to process problems of different sizes, and the division between the layers is defined according to the configuration of the underlying computer system, such as memory capacity and processing power, as well as the communication overhead between device and host. In general, the higher-level Jacobi kernel methods call the lower level Jacobi kernel methods, and the results are passed up the hierarchy. This process is iteratively performed until a convergence condition is reached. Embodiments of the hierarchical Jacobi method disclosed herein offers controllability of Schur decomposition, robust tolerance for passing data throughout the hierarchy, and significant cost reduction on row update compared to existing methods.
    Type: Grant
    Filed: September 7, 2018
    Date of Patent: December 15, 2020
    Assignee: NVIDIA Corporation
    Inventor: Lung-Sheng Chien
  • Patent number: 10867009
    Abstract: Disclosed embodiments relate to an accelerator for sparse-dense matrix instructions. In one example, a processor to execute a sparse-dense matrix multiplication instruction, includes fetch circuitry to fetch the sparse-dense matrix multiplication instruction having fields to specify an opcode, a dense output matrix, a dense source matrix, and a sparse source matrix having a sparsity of non-zero elements, the sparsity being less than one, decode circuitry to decode the fetched sparse-dense matrix multiplication instruction, execution circuitry to execute the decoded sparse-dense matrix multiplication instruction to, for each non-zero element at row M and column K of the specified sparse source matrix generate a product of the non-zero element and each corresponding dense element at row K and column N of the specified dense source matrix, and generate an accumulated sum of each generated product and a previous value of a corresponding output element at row M and column N of the specified dense output matrix.
    Type: Grant
    Filed: July 6, 2020
    Date of Patent: December 15, 2020
    Assignee: Intel Corporation
    Inventors: Srinivasan Narayanamoorthy, Nadathur Rajagopalan Satish, Alexey Suprun, Kenneth J. Janik
  • Patent number: 10861563
    Abstract: The present disclosure includes apparatuses and methods related to determining population count. An example apparatus comprises an array of memory cells coupled to sensing circuitry. The apparatus can include a controller configured to cause: summing, in parallel, of data values corresponding to respective ones of a plurality of first vectors stored in memory cells of the array as a data value sum representing a population count thereof, wherein a second vector is stored as the plurality of first vectors, and wherein each first vector of the plurality of first vectors is stored in respective memory cells of the array that are coupled to a respective sense line of a plurality of sense lines; and iteratively summing, in parallel, of data value sums corresponding to the plurality of first vectors to provide a single data value sum corresponding to the second vector.
    Type: Grant
    Filed: February 10, 2020
    Date of Patent: December 8, 2020
    Assignee: Micron Technology, Inc.
    Inventors: Timothy P. Finkbeiner, Glen E. Hush, Richard C. Murphy
  • Patent number: 10860679
    Abstract: According to one embodiment, a calculating device includes a processor repeating a processing procedure. The processing procedure includes a first variable update and a second variable update. The first variable update includes updating an ith entry of a first variable xi by adding a first function to the ith entry of the first variable xi before the first variable update. The second variable update includes updating the ith entry of the second variable yi by adding a second function and a third function to the ith entry of the second variable yi before the second variable update. The processor performs at least an output of at least one of the ith entry of the first variable xi obtained after the repeating of the processing procedure or a function of the ith entry of the first variable xi obtained after the repeating of the processing procedure.
    Type: Grant
    Filed: September 6, 2018
    Date of Patent: December 8, 2020
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Hayato Goto, Kosuke Tatsumura
  • Patent number: 10860681
    Abstract: Aspects for matrix addition in neural network are described herein. The aspects may include a controller unit configured to receive a matrix-add-scalar instruction that includes an address of the first matrix and a scalar value. The aspects may further include a computation module configured to receive the first matrix from a storage device based on the address of the first matrix. The first matrix may include one or more first elements. The one or more first elements are arranged in accordance with a two-dimensional data structure. The computation module may be further configured to respectively add the scalar value to each of the one or more first elements of the first matrix in accordance with the matrix-add-scalar instruction to generate one or more second elements for a second matrix.
    Type: Grant
    Filed: October 26, 2018
    Date of Patent: December 8, 2020
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Xiao Zhang, Shaoli Liu, Tianshi Chen, Yunji Chen
  • Patent number: 10860316
    Abstract: Aspects for generating a dot product for two vectors in neural network are described herein. The aspects may include a controller unit configured to receive a vector load instruction that includes a first address of a first vector and a length of the first vector. The aspects may further include a direct memory access unit configured to retrieve the first vector from a storage device based on the first address of the first vector. Further still, the aspects may include a caching unit configured to store the first vector.
    Type: Grant
    Filed: October 26, 2018
    Date of Patent: December 8, 2020
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Tian Zhi, Qi Guo, Shaoli Liu, Tianshi Chen, Yunji Chen
  • Patent number: 10853721
    Abstract: According to an embodiment, a multiplier accumulator includes a controller, a high-order multiplier, a high-order accumulator, a low-order multiplier, and an output unit. The controller is configured to designate each digit within a range of the most significant digit in a coefficient for an input value to a stop digit as a target digit. The high-order multiplier is configured to calculate a high-order multiplication value by multiplying the input value, and a value and a weight of the target digit. The high-order accumulator is configured to calculate a high-order accumulation value by accumulatively adding the high-order multiplication values for input values. The low-order multiplier is configured to calculate a low-order multiplication value by multiplying an input value and a value of a digit smaller than the stop digit. The output unit is configured to output a value determined based on whether the high-order accumulation value exceeds a boundary value.
    Type: Grant
    Filed: August 24, 2017
    Date of Patent: December 1, 2020
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masafumi Mori, Takao Marukame, Tetsufumi Tanamoto, Satoshi Takaya
  • Patent number: 10853068
    Abstract: The present invention includes a method for operating a data processing system to compute an approximation to a scalar product between first and second vectors in which each vector is characterized by N components. The method includes replacing the first vector by a third vector that is a pyramid integer vector characterized by N components and an integer K equal to the sum of the absolute values of the N components, and computing a scalar product of the third vector with the second vector to provide the approximation to the scalar product between the first and second vectors. Computing the scalar product of the second and third vectors can be carried out by K additions followed by one floating point multiply.
    Type: Grant
    Filed: September 28, 2018
    Date of Patent: December 1, 2020
    Assignee: Ocean Logic Pty Ltd
    Inventor: Vincenzo Liguori
  • Patent number: 10853067
    Abstract: Embodiments detailed herein relate to arithmetic operations of float-point values. An exemplary processor includes decoding circuitry to decode an instruction, where the instruction specifies locations of a plurality of operands, values of which being in a floating-point format. The exemplary processor further includes execution circuitry to execute the decoded instruction, where the execution includes to: convert the values for each operand, each value being converted into a plurality of lower precision values, where an exponent is to be stored for each operand; perform arithmetic operations among lower precision values converted from values for the plurality of the operands; and generate a floating-point value by converting a resulting value from the arithmetic operations into the floating-point format and store the floating-point value.
    Type: Grant
    Filed: September 27, 2018
    Date of Patent: December 1, 2020
    Assignee: Intel Corporation
    Inventors: Gregory Henry, Alexander Heinecke
  • Patent number: 10853034
    Abstract: An integrated circuit that includes common factor mass multiplier (CFMM) circuitry is provided that multiplies a common factor operand by a large number of multiplier operands. The CFMM circuitry may be implemented as an instance specific version or a non-instance specific version. The instance specific version might also be fully enumerated so that the hardware doesn't have to be redesigned assuming all possible unique multiplier values are implemented. Either version can be formed on a programmable integrated circuit or an application-specific integrated circuit. CFMM circuitry configured in this way can be used to support convolution neural networks or any operation that requires a straight common factor multiply. Any adder component with the CFMM circuitry may be implemented using bit-serial adders. The bit-serial adders may be further connected in a tree in CNN applications to sum together many input streams.
    Type: Grant
    Filed: September 28, 2018
    Date of Patent: December 1, 2020
    Assignee: Intel Corporation
    Inventors: Thiam Khean Hah, Jason Gee Hock Ong, Yeong Tat Liew, Carl Ebeling, Vamsi Nalluri
  • Patent number: 10855255
    Abstract: The present invention addresses the problem of increasing the likelihood of making it possible to reduce the consumption of power necessary for filter processing and the amount of heat generated during filter processing. In order to overcome this problem, a second complex signal and a third complex signal are generated from a first complex signal in a frequency domain, the third complex signal being a complex conjugate of the second complex signal. Signal selection is performed from the plurality of types of complex signals having different amounts of change in signal amplitude. Processing is performed on the complex signal selected as the signal using a first filter coefficient and a second filter coefficient. The complex signals after filter processing are synthesized to generate a complex signal, which is then outputted.
    Type: Grant
    Filed: December 1, 2016
    Date of Patent: December 1, 2020
    Assignee: NEC CORPORATION
    Inventor: Atsufumi Shibayama
  • Patent number: 10853445
    Abstract: In order to reduce a circuit scale and power consumption while maintaining filter performance, a digital filter device includes a first transform circuit for executing a first transform process on data in a predetermined frequency range; a filtering circuit for executing a filtering process by setting an operation bit width of data of a preset first frequency component among the data, on which the first transform process was executed by the first transform circuit, to a different bit width from bit widths of other frequency components; and a second transform circuit for executing a second transform process on the data on which the filtering process was executed by the filtering circuit.
    Type: Grant
    Filed: April 14, 2017
    Date of Patent: December 1, 2020
    Assignee: NEC CORPORATION
    Inventor: Atsufumi Shibayama
  • Patent number: 10846053
    Abstract: Methods and apparatuses for generating a condition code for a floating point number operation prior to normalization. A processor receives an intermediate result for an operation, wherein the intermediate result comprises an intermediate significand and an intermediate exponent. A processor determines a mask based on the value of the intermediate exponent. A processor generates a masked significand by applying the mask to the intermediate significand. A processor generates a condition code based on the masked significand having a predetermined value.
    Type: Grant
    Filed: June 27, 2014
    Date of Patent: November 24, 2020
    Assignee: International Business Machines Corporation
    Inventors: Son T. Dao, Silvia Melitta Mueller
  • Patent number: 10846089
    Abstract: A binary logic circuit for manipulating an input binary string includes a first stage of a first group of multiplexers arranged to select respective portions of an input binary string and configured to receive a respective first control. A second stage is included in which a plurality of a second group of multiplexers is arranged to select respective portions of the input binary string and configured to receive a respective second control signal. The control signals are provided such that each multiplexer of a second group is configured to select a respective second portion of the first binary string. Control circuitry is configured to generate the first and second control signals such that two or more of the first groups and/or two or more of the second groups of multiplexers are independently controllable.
    Type: Grant
    Filed: August 31, 2018
    Date of Patent: November 24, 2020
    Assignee: MIPS Tech, LLC
    Inventors: James Hippisley Robinson, Morgyn Taylor
  • Patent number: 10846054
    Abstract: Methods and apparatuses for generating a condition code for a floating point number operation prior to normalization. A processor receives an intermediate result for an operation, wherein the intermediate result comprises an intermediate significand and an intermediate exponent. A processor determines a mask based on the value of the intermediate exponent. A processor generates a masked significand by applying the mask to the intermediate significand. A processor generates a condition code based on the masked significand having a predetermined value.
    Type: Grant
    Filed: December 17, 2014
    Date of Patent: November 24, 2020
    Assignee: International Business Machines Corporation
    Inventors: Son T. Dao, Silvia Melitta Mueller
  • Patent number: 10846365
    Abstract: A method for use in an associative memory device when multiplying by a sparse matrix includes storing only non-zero elements of the sparse matrix in the associative memory device as multiplicands. The storing includes locating the non-zero elements in computation columns of the associative memory device according to linear algebra rules along with their associated multiplicands such that a multiplicand and a multiplier of each multiplication operation to be performed are stored in a same computation column. The locating locates one of the non-zero elements in more than one computation column if one of the non-zero elements is utilized in more than one multiplication operation.
    Type: Grant
    Filed: November 25, 2019
    Date of Patent: November 24, 2020
    Assignee: GSI Technology Inc.
    Inventor: Avidan Akerib