Patents Examined by Matthew D Sandifer
  • Patent number: 12045308
    Abstract: Detailed are embodiments related to bit matrix multiplication in a processor. For example, in some embodiments a processor comprising: decode circuitry to decode an instruction have fields for an opcode, an identifier of a first source bit matrix, an identifier of a second source bit matrix, an identifier of a destination bit matrix, and an immediate; and execution circuitry to execute the decoded instruction to perform a multiplication of a matrix of S-bit elements of the identified first source bit matrix with S-bit elements of the identified second source bit matrix, wherein the multiplication and accumulation operations are selected by the operation selector and store a result of the matrix multiplication into the identified destination bit matrix, wherein S indicates a plural bit size is described.
    Type: Grant
    Filed: December 16, 2022
    Date of Patent: July 23, 2024
    Assignee: Intel Corporation
    Inventors: Dmitry Y. Babokin, Kshitij A. Doshi, Vadim Sukhomlinov
  • Patent number: 12045172
    Abstract: A method is provided that includes performing, by a processor in response to a floating point multiply instruction, multiplication of floating point numbers, wherein determination of values of implied bits of leading bit encoded mantissas of the floating point numbers is performed in parallel with multiplication of the encoded mantissas, and storing, by the processor, a result of the floating point multiply instruction in a storage location indicated by the floating point multiply instruction.
    Type: Grant
    Filed: November 15, 2022
    Date of Patent: July 23, 2024
    Assignee: Texas Instruments Incorporated
    Inventors: Mujibur Rahman, Timothy David Anderson
  • Patent number: 12045309
    Abstract: In a system with control logic and a processing element array, two modes of operation may be provided. In the first mode of operation, the control logic may configure the system to perform matrix multiplication or 1×1 convolution. In the second mode of operation, the control logic may configure the system to perform 3×3 convolution. The processing element array may include an array of processing elements. Each of the processing elements may be configured to compute the dot product of two vectors in a single clock cycle, and further may accumulate the dot products that are sequentially computed over time.
    Type: Grant
    Filed: November 29, 2023
    Date of Patent: July 23, 2024
    Assignee: Recogni Inc.
    Inventors: Jian hui Huang, Gary S. Goldman
  • Patent number: 12045080
    Abstract: An optical computation system, preferably including an optical source, a splitter, and one or more phase accumulator banks. A phase accumulator bank, preferably including two optical paths, a plurality of phase accumulator units, and a detector module, and optionally including one or more compensation phase shifters. A method, preferably including receiving one or more optical inputs, receiving one or more electrical inputs, controlling one or more phase accumulator units based on the electrical inputs, and generating one or more electrical outputs based on optical signals.
    Type: Grant
    Filed: February 8, 2021
    Date of Patent: July 23, 2024
    Assignee: Luminous Computing, Inc.
    Inventors: Thomas W. Baehr-Jones, Mitchell A. Nahmias
  • Patent number: 12045307
    Abstract: Today neural networks are used to enable autonomous vehicles and improve the quality of speech recognition, real-time language translation, and online search optimizations. However, operation of the neural networks for these applications consumes energy. Quantization of parameters used by the neural networks reduces the amount of memory needed to store the parameters while also reducing the power consumed during operation of the neural network. Matrix operations performed by the neural networks require many multiplication calculations, so reducing the number of bits that are multiplied reduces the energy that is consumed. Quantizing smaller sets of the parameters using a shared scale factor improves accuracy compared with quantizing larger sets of the parameters. Accuracy of the calculations may be maintained by quantizing and scaling the parameters using fine-grained per-vector scale factors. A vector includes one or more elements within a single dimension of a multi-dimensional matrix.
    Type: Grant
    Filed: October 30, 2020
    Date of Patent: July 23, 2024
    Assignee: NVIDIA Corporation
    Inventors: Brucek Kurdo Khailany, Steve Haihang Dai, Rangharajan Venkatesan, Haoxing Ren
  • Patent number: 12039290
    Abstract: In a multiply accumulate (MAC) unit, an accumulator may be implemented in two or more stages. For example, a first accumulator may accumulate products from the multiplier of the MAC unit, and a second accumulator may periodically accumulate the running total of the first accumulator. Each time the first accumulator's running total is accumulated by the second accumulator, the first accumulator may be initialized to begin a new accumulation period. In one embodiment, the number of values accumulated by the first accumulator within an accumulation period may be a user-adjustable parameter. In one embodiment, the bit width of the input of the second accumulator may be greater than the bit width of the output of the first accumulator. In another embodiment, an adder may be shared between the first and second accumulators, and a multiplexor may switch the accumulation operations between the first and second accumulators.
    Type: Grant
    Filed: January 9, 2024
    Date of Patent: July 16, 2024
    Assignee: Recogni Inc.
    Inventors: Jian hui Huang, Gary S. Goldman
  • Patent number: 12039289
    Abstract: Processing cores with data associative adaptive rounding and associated methods are disclosed herein. One disclosed processing core comprises an arithmetic logic unit cluster configured to generate a value for a unit of directed graph data using input directed graph data, a comparator coupled to a threshold register and a data register, a core controller configured to load a threshold value into the threshold register when the value for the unit of directed graph data is loaded into the data register, and a rounding circuit. The rounding circuit is configured to receive the value for the unit of directed graph data from the arithmetic logic unit cluster and conditionally round the value for the unit of directed graph data based on a comparator output from the comparator.
    Type: Grant
    Filed: April 10, 2023
    Date of Patent: July 16, 2024
    Assignee: Tenstorrent Inc.
    Inventors: Ljubisa Bajic, Alex Cejkov, Lejla Bajic
  • Patent number: 12015428
    Abstract: A system and/or an integrated circuit including: (a) a multiplier-accumulator execution pipeline including multiplier-accumulator circuits to process image data, using associated filter weights, via a plurality of multiply and accumulate operations and (b) first data format conversion circuitry including (i) inputs to receive filter weights of a plurality of sets of filter weights, wherein each set includes a plurality of filter weights each having a block-scaled fraction data format, (ii) conversion circuitry, coupled to the inputs, to convert the filter weights of each set from the block-scaled fraction data format to a floating point data format, and (iii) outputs to output the filter weights having the floating point data format. In operation, the multiplier-accumulator circuits of the pipeline are configured to perform the plurality of multiply and accumulate operations using (a) the image data and (b) the filter weights having the floating point data format.
    Type: Grant
    Filed: October 20, 2020
    Date of Patent: June 18, 2024
    Assignee: Flex Logix Technologies, Inc.
    Inventors: Frederick A Ware, Cheng C. Wang
  • Patent number: 12014150
    Abstract: A tile of an FPGA includes a multiple mode arithmetic circuit. The multiple mode arithmetic circuit is configured by control signals to operate in an integer mode, a floating-point mode, or both. In some example embodiments, multiple integer modes (e.g., unsigned, two's complement, and sign-magnitude) are selectable, multiple floating-point modes (e.g., 16-bit mantissa and 8-bit sign, 8-bit mantissa and 6-bit sign, and 6-bit mantissa and 6-bit sign) are supported, or any suitable combination thereof. The tile may also fuse a memory circuit with the arithmetic circuits. Connections directly between multiple instances of the tile are also available, allowing multiple tiles to be treated as larger memories or arithmetic circuits. By using these connections, referred to as cascade inputs and outputs, the input and output bandwidth of the arithmetic circuit is further increased.
    Type: Grant
    Filed: March 23, 2023
    Date of Patent: June 18, 2024
    Assignee: Achronix Semiconductor Corporation
    Inventors: Daniel Pugh, Raymond Nijssen, Michael Philip Fitton, Marcel Van der Goot
  • Patent number: 12014264
    Abstract: A data processing circuit is disclosed. The data processing circuit relates to the field of digital circuits, and includes a first computing circuit and an input control circuit. The first computing circuit includes one or more computing sub-circuits. Each computing sub-circuit includes a first addition operation circuit, a multiplication operation circuit, a first comparison operation circuit, and a first nonlinear operation circuit. The first nonlinear operation circuit includes at least one of an exponential operation circuit and a logarithmic operation circuit. The input control circuit is configured to: control the first computing circuit to read input data and an input parameter, and control, according to a received first instruction, the operation circuit in the computing sub-circuit included in the first computing circuit, to perform an operation on the input data and the input parameter.
    Type: Grant
    Filed: August 28, 2020
    Date of Patent: June 18, 2024
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Zhanying He, Bin Xu, Honghui Yuan
  • Patent number: 12001508
    Abstract: A plurality of chiplets may be used to multiply two matrices A and B. Matrix A may be decomposed into horizontal stripes and matrix B may be decomposed into vertical stripes. Each of the horizontal stripes may be multiplied by each of the vertical stripes to form the output matrix C. Specifically, horizontal stripes may be stored in a stationary, distributed manner across the chiplets, while the vertical stripes (or sub-vertical stripes) may be passed between respective pairs of the chiplets until each of the vertical stripes (or sub-vertical stripes) of matrix B has been received and processed by each of the chiplets. The vertical stripes may be passed along one or more paths that interconnect the chiplets. Similar techniques can be applied to an arrangement in which the vertical stripes are stationary and the horizontal stripes (or sub-horizontal stripes) are passed between respective pairs of the chiplets.
    Type: Grant
    Filed: October 23, 2023
    Date of Patent: June 4, 2024
    Assignee: Persimmons, Inc.
    Inventor: James Michael Bodwin
  • Patent number: 11971949
    Abstract: A graphics processing unit (GPU) and a method is disclosed that performs a convolution operation recast as a matrix multiplication operation. The GPU includes a register file, a processor and a state machine. The register file stores data of an input feature map and data of a filter weight kernel. The processor performs a convolution operation on data of the input feature map and data of the filter weight kernel as a matrix multiplication operation. The state machine facilitates performance of the convolution operation by unrolling the data of the input feature map and the data of the filter weight kernel in the register file. The state machine includes control registers that determine movement of data through the register file to perform the matrix multiplication operation on the data in the register file in an unrolled manner.
    Type: Grant
    Filed: February 10, 2021
    Date of Patent: April 30, 2024
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Christopher P. Frascati, Simon Waters, Rama S. B Harihara, David C. Tannenbaum
  • Patent number: 11973519
    Abstract: Examples described herein relate to an apparatus comprising a central processing unit (CPU) and an encoding accelerator coupled to the CPU, the encoding accelerator comprising an entropy encoder to determine normalized probability of occurrence of a symbol in a set of characters using a normalized probability approximation circuitry, wherein the normalized probability approximation circuitry is to output the normalized probability of occurrence of a symbol in a set of characters for lossless compression. In some examples, the normalized probability approximation circuitry includes a shifter, adder, subtractor, or a comparator. In some examples, the normalized probability approximation circuitry is to determine normalized probability by performance of non-power of 2 division without computation by a Floating Point Unit (FPU). In some examples, the normalized probability approximation circuitry is to round the normalized probability to a decimal.
    Type: Grant
    Filed: June 23, 2020
    Date of Patent: April 30, 2024
    Assignee: Intel Corporation
    Inventors: Bhushan G. Parikh, Stephen T. Palermo
  • Patent number: 11962333
    Abstract: A data-compression analyzer can rapidly make a binary decision to compress or not compress an input data block or can use a slower neural network to predict the block's compression ratio with a regression model. A Concentration Value (CV) that is the sum of the squares of the frequencies and a Number of Zero (NZ) symbols are calculated from an un-sorted symbol frequency table. A rapid decision to compress is signaled when their product CV*NZ exceeds a horizontal threshold THH. During training, CV*NZ is plotted as a function of compression ratio C % for many training data blocks. Different test values of THH are applied to the plot to determine true and false positive rates that are plotted as a Receiver Operating Characteristic (ROC) curve. The point on the ROC curve having the largest Youden index is selected as the optimum THH for use in future binary decisions.
    Type: Grant
    Filed: August 19, 2022
    Date of Patent: April 16, 2024
    Assignee: Hong Kong Applied Science and Technology Research Institute Company Limited
    Inventors: Hailiang Li, Yan Huo, Tao Li
  • Patent number: 11943353
    Abstract: A computer processing system having an isogeny-based cryptosystem for randomizing computational hierarchy to protect against side-channel analysis in isogeny-based cryptosystems.
    Type: Grant
    Filed: December 17, 2020
    Date of Patent: March 26, 2024
    Assignee: PQSecure Technologies, LLC
    Inventors: Brian C. Koziel, Rami El Khatib
  • Patent number: 11934797
    Abstract: A processor to facilitate execution of a single-precision floating point operation on an operand is disclosed. The processor includes one or more execution units, each having a plurality of floating point units to execute one or more instructions to perform the single-precision floating point operation on the operand, including performing a floating point operation on an exponent component of the operand; and performing a floating point operation on a mantissa component of the operand, comprising dividing the mantissa component into a first sub-component and a second sub-component, determining a result of the floating point operation for the first sub-component and determining a result of the floating point operation for the second sub-component, and returning a result of the floating point operation.
    Type: Grant
    Filed: April 4, 2019
    Date of Patent: March 19, 2024
    Assignee: Intel Corporation
    Inventors: Abhishek Rhisheekesan, Shashank Lakshminarayana, Subramaniam Maiyuran
  • Patent number: 11934799
    Abstract: Combinatorial logic circuits with feedback, which include at least two combinatorial logic elements, are disclosed. At least one of the combinatorial logic elements receives an external input (i.e., from outside the circuit), at least one of the combinatorial logic elements receives an input that is feedback of the circuit output, and at least one of the combinatorial logic elements receives an input that is neither an external input nor an output of the circuit but rather is from another of the combinatorial logic elements and thus only “implicit” to the circuit. No staticizers are needed; the logic circuits effectively create implicit equations to perform functions that were previously thought to require sequential logic. The combinatorial logic circuits result in a stable output (in some instances after a brief period of time) due to the implicit equations, rather than achieving stability from an explicit expression of some input to the circuit.
    Type: Grant
    Filed: August 12, 2021
    Date of Patent: March 19, 2024
    Assignee: SiliconIntervention Inc.
    Inventor: A. Martin Mallinson
  • Patent number: 11928176
    Abstract: A system and method for multiplying matrices are provided. The system includes a processor coupled to a memory and a matrix multiply accelerator (MMA) coupled to the processor. The MMA is configured to multiply, based on a bitmap, a compressed first matrix and a second matrix to generate an output matrix including, for each element i,j of the output matrix, a calculation of a dot product of an ith row of the compressed first matrix and a jth column of the second matrix based on the bitmap. Or, the MMA is configured to multiply, based on the bitmap, the second matrix and the compressed first matrix and to generate the output matrix including, for each element i,j of the output matrix, a calculation of a dot product of an ith row of the second matrix and a jth column of the compressed first matrix based on the bitmap.
    Type: Grant
    Filed: November 24, 2020
    Date of Patent: March 12, 2024
    Assignee: Arm Limited
    Inventors: Zhi-Gang Liu, Paul Nicholas Whatmough, Matthew Mattina
  • Patent number: 11914974
    Abstract: According to some embodiments, a system comprises a generator of a truly random signal is connected to an input and feedback device for the purpose of providing a user with real time feedback on the random signal. The user observes a representation of the signal in the process of an external physical event for the purpose of finding a correlation between the random output and what happens during the physical event. In some examples, the system is preferably designed such the system is shielded from all classically known forces such as gravity, physical pressure, motion, electromagnetic fields, humidity, etc. and/or, such classical forces are factored out of the process as much as possible. The system is thus designed to be selectively response to signals from living creatures, in particular, humans.
    Type: Grant
    Filed: November 30, 2020
    Date of Patent: February 27, 2024
    Assignee: Psyleron, Inc.
    Inventors: John Valentino, Herb Mertz, Ian Cook
  • Patent number: 11907330
    Abstract: Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. Each cell of the matrix multiply includes: a weight matrix register configured to receive a weight input from either a transposed or a non-transposed weight shift register; a transposed weight shift register configured to receive a weight input from a horizontal direction to be stored in the weight matrix register; a non-transposed weight shift register configured to receive a weight input from a vertical direction to be stored in the weight matrix register; and a multiply unit that is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.
    Type: Grant
    Filed: February 17, 2023
    Date of Patent: February 20, 2024
    Assignee: Google LLC
    Inventors: Andrew Everett Phelps, Norman Paul Jouppi