Patents Examined by Matthew D Sandifer
-
Patent number: 12045308Abstract: Detailed are embodiments related to bit matrix multiplication in a processor. For example, in some embodiments a processor comprising: decode circuitry to decode an instruction have fields for an opcode, an identifier of a first source bit matrix, an identifier of a second source bit matrix, an identifier of a destination bit matrix, and an immediate; and execution circuitry to execute the decoded instruction to perform a multiplication of a matrix of S-bit elements of the identified first source bit matrix with S-bit elements of the identified second source bit matrix, wherein the multiplication and accumulation operations are selected by the operation selector and store a result of the matrix multiplication into the identified destination bit matrix, wherein S indicates a plural bit size is described.Type: GrantFiled: December 16, 2022Date of Patent: July 23, 2024Assignee: Intel CorporationInventors: Dmitry Y. Babokin, Kshitij A. Doshi, Vadim Sukhomlinov
-
Patent number: 12045172Abstract: A method is provided that includes performing, by a processor in response to a floating point multiply instruction, multiplication of floating point numbers, wherein determination of values of implied bits of leading bit encoded mantissas of the floating point numbers is performed in parallel with multiplication of the encoded mantissas, and storing, by the processor, a result of the floating point multiply instruction in a storage location indicated by the floating point multiply instruction.Type: GrantFiled: November 15, 2022Date of Patent: July 23, 2024Assignee: Texas Instruments IncorporatedInventors: Mujibur Rahman, Timothy David Anderson
-
Patent number: 12045309Abstract: In a system with control logic and a processing element array, two modes of operation may be provided. In the first mode of operation, the control logic may configure the system to perform matrix multiplication or 1×1 convolution. In the second mode of operation, the control logic may configure the system to perform 3×3 convolution. The processing element array may include an array of processing elements. Each of the processing elements may be configured to compute the dot product of two vectors in a single clock cycle, and further may accumulate the dot products that are sequentially computed over time.Type: GrantFiled: November 29, 2023Date of Patent: July 23, 2024Assignee: Recogni Inc.Inventors: Jian hui Huang, Gary S. Goldman
-
Patent number: 12045080Abstract: An optical computation system, preferably including an optical source, a splitter, and one or more phase accumulator banks. A phase accumulator bank, preferably including two optical paths, a plurality of phase accumulator units, and a detector module, and optionally including one or more compensation phase shifters. A method, preferably including receiving one or more optical inputs, receiving one or more electrical inputs, controlling one or more phase accumulator units based on the electrical inputs, and generating one or more electrical outputs based on optical signals.Type: GrantFiled: February 8, 2021Date of Patent: July 23, 2024Assignee: Luminous Computing, Inc.Inventors: Thomas W. Baehr-Jones, Mitchell A. Nahmias
-
Patent number: 12045307Abstract: Today neural networks are used to enable autonomous vehicles and improve the quality of speech recognition, real-time language translation, and online search optimizations. However, operation of the neural networks for these applications consumes energy. Quantization of parameters used by the neural networks reduces the amount of memory needed to store the parameters while also reducing the power consumed during operation of the neural network. Matrix operations performed by the neural networks require many multiplication calculations, so reducing the number of bits that are multiplied reduces the energy that is consumed. Quantizing smaller sets of the parameters using a shared scale factor improves accuracy compared with quantizing larger sets of the parameters. Accuracy of the calculations may be maintained by quantizing and scaling the parameters using fine-grained per-vector scale factors. A vector includes one or more elements within a single dimension of a multi-dimensional matrix.Type: GrantFiled: October 30, 2020Date of Patent: July 23, 2024Assignee: NVIDIA CorporationInventors: Brucek Kurdo Khailany, Steve Haihang Dai, Rangharajan Venkatesan, Haoxing Ren
-
Patent number: 12039290Abstract: In a multiply accumulate (MAC) unit, an accumulator may be implemented in two or more stages. For example, a first accumulator may accumulate products from the multiplier of the MAC unit, and a second accumulator may periodically accumulate the running total of the first accumulator. Each time the first accumulator's running total is accumulated by the second accumulator, the first accumulator may be initialized to begin a new accumulation period. In one embodiment, the number of values accumulated by the first accumulator within an accumulation period may be a user-adjustable parameter. In one embodiment, the bit width of the input of the second accumulator may be greater than the bit width of the output of the first accumulator. In another embodiment, an adder may be shared between the first and second accumulators, and a multiplexor may switch the accumulation operations between the first and second accumulators.Type: GrantFiled: January 9, 2024Date of Patent: July 16, 2024Assignee: Recogni Inc.Inventors: Jian hui Huang, Gary S. Goldman
-
Patent number: 12039289Abstract: Processing cores with data associative adaptive rounding and associated methods are disclosed herein. One disclosed processing core comprises an arithmetic logic unit cluster configured to generate a value for a unit of directed graph data using input directed graph data, a comparator coupled to a threshold register and a data register, a core controller configured to load a threshold value into the threshold register when the value for the unit of directed graph data is loaded into the data register, and a rounding circuit. The rounding circuit is configured to receive the value for the unit of directed graph data from the arithmetic logic unit cluster and conditionally round the value for the unit of directed graph data based on a comparator output from the comparator.Type: GrantFiled: April 10, 2023Date of Patent: July 16, 2024Assignee: Tenstorrent Inc.Inventors: Ljubisa Bajic, Alex Cejkov, Lejla Bajic
-
Patent number: 12015428Abstract: A system and/or an integrated circuit including: (a) a multiplier-accumulator execution pipeline including multiplier-accumulator circuits to process image data, using associated filter weights, via a plurality of multiply and accumulate operations and (b) first data format conversion circuitry including (i) inputs to receive filter weights of a plurality of sets of filter weights, wherein each set includes a plurality of filter weights each having a block-scaled fraction data format, (ii) conversion circuitry, coupled to the inputs, to convert the filter weights of each set from the block-scaled fraction data format to a floating point data format, and (iii) outputs to output the filter weights having the floating point data format. In operation, the multiplier-accumulator circuits of the pipeline are configured to perform the plurality of multiply and accumulate operations using (a) the image data and (b) the filter weights having the floating point data format.Type: GrantFiled: October 20, 2020Date of Patent: June 18, 2024Assignee: Flex Logix Technologies, Inc.Inventors: Frederick A Ware, Cheng C. Wang
-
Patent number: 12014150Abstract: A tile of an FPGA includes a multiple mode arithmetic circuit. The multiple mode arithmetic circuit is configured by control signals to operate in an integer mode, a floating-point mode, or both. In some example embodiments, multiple integer modes (e.g., unsigned, two's complement, and sign-magnitude) are selectable, multiple floating-point modes (e.g., 16-bit mantissa and 8-bit sign, 8-bit mantissa and 6-bit sign, and 6-bit mantissa and 6-bit sign) are supported, or any suitable combination thereof. The tile may also fuse a memory circuit with the arithmetic circuits. Connections directly between multiple instances of the tile are also available, allowing multiple tiles to be treated as larger memories or arithmetic circuits. By using these connections, referred to as cascade inputs and outputs, the input and output bandwidth of the arithmetic circuit is further increased.Type: GrantFiled: March 23, 2023Date of Patent: June 18, 2024Assignee: Achronix Semiconductor CorporationInventors: Daniel Pugh, Raymond Nijssen, Michael Philip Fitton, Marcel Van der Goot
-
Patent number: 12014264Abstract: A data processing circuit is disclosed. The data processing circuit relates to the field of digital circuits, and includes a first computing circuit and an input control circuit. The first computing circuit includes one or more computing sub-circuits. Each computing sub-circuit includes a first addition operation circuit, a multiplication operation circuit, a first comparison operation circuit, and a first nonlinear operation circuit. The first nonlinear operation circuit includes at least one of an exponential operation circuit and a logarithmic operation circuit. The input control circuit is configured to: control the first computing circuit to read input data and an input parameter, and control, according to a received first instruction, the operation circuit in the computing sub-circuit included in the first computing circuit, to perform an operation on the input data and the input parameter.Type: GrantFiled: August 28, 2020Date of Patent: June 18, 2024Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Zhanying He, Bin Xu, Honghui Yuan
-
Patent number: 12001508Abstract: A plurality of chiplets may be used to multiply two matrices A and B. Matrix A may be decomposed into horizontal stripes and matrix B may be decomposed into vertical stripes. Each of the horizontal stripes may be multiplied by each of the vertical stripes to form the output matrix C. Specifically, horizontal stripes may be stored in a stationary, distributed manner across the chiplets, while the vertical stripes (or sub-vertical stripes) may be passed between respective pairs of the chiplets until each of the vertical stripes (or sub-vertical stripes) of matrix B has been received and processed by each of the chiplets. The vertical stripes may be passed along one or more paths that interconnect the chiplets. Similar techniques can be applied to an arrangement in which the vertical stripes are stationary and the horizontal stripes (or sub-horizontal stripes) are passed between respective pairs of the chiplets.Type: GrantFiled: October 23, 2023Date of Patent: June 4, 2024Assignee: Persimmons, Inc.Inventor: James Michael Bodwin
-
Patent number: 11971949Abstract: A graphics processing unit (GPU) and a method is disclosed that performs a convolution operation recast as a matrix multiplication operation. The GPU includes a register file, a processor and a state machine. The register file stores data of an input feature map and data of a filter weight kernel. The processor performs a convolution operation on data of the input feature map and data of the filter weight kernel as a matrix multiplication operation. The state machine facilitates performance of the convolution operation by unrolling the data of the input feature map and the data of the filter weight kernel in the register file. The state machine includes control registers that determine movement of data through the register file to perform the matrix multiplication operation on the data in the register file in an unrolled manner.Type: GrantFiled: February 10, 2021Date of Patent: April 30, 2024Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Christopher P. Frascati, Simon Waters, Rama S. B Harihara, David C. Tannenbaum
-
Patent number: 11973519Abstract: Examples described herein relate to an apparatus comprising a central processing unit (CPU) and an encoding accelerator coupled to the CPU, the encoding accelerator comprising an entropy encoder to determine normalized probability of occurrence of a symbol in a set of characters using a normalized probability approximation circuitry, wherein the normalized probability approximation circuitry is to output the normalized probability of occurrence of a symbol in a set of characters for lossless compression. In some examples, the normalized probability approximation circuitry includes a shifter, adder, subtractor, or a comparator. In some examples, the normalized probability approximation circuitry is to determine normalized probability by performance of non-power of 2 division without computation by a Floating Point Unit (FPU). In some examples, the normalized probability approximation circuitry is to round the normalized probability to a decimal.Type: GrantFiled: June 23, 2020Date of Patent: April 30, 2024Assignee: Intel CorporationInventors: Bhushan G. Parikh, Stephen T. Palermo
-
Patent number: 11962333Abstract: A data-compression analyzer can rapidly make a binary decision to compress or not compress an input data block or can use a slower neural network to predict the block's compression ratio with a regression model. A Concentration Value (CV) that is the sum of the squares of the frequencies and a Number of Zero (NZ) symbols are calculated from an un-sorted symbol frequency table. A rapid decision to compress is signaled when their product CV*NZ exceeds a horizontal threshold THH. During training, CV*NZ is plotted as a function of compression ratio C % for many training data blocks. Different test values of THH are applied to the plot to determine true and false positive rates that are plotted as a Receiver Operating Characteristic (ROC) curve. The point on the ROC curve having the largest Youden index is selected as the optimum THH for use in future binary decisions.Type: GrantFiled: August 19, 2022Date of Patent: April 16, 2024Assignee: Hong Kong Applied Science and Technology Research Institute Company LimitedInventors: Hailiang Li, Yan Huo, Tao Li
-
Patent number: 11943353Abstract: A computer processing system having an isogeny-based cryptosystem for randomizing computational hierarchy to protect against side-channel analysis in isogeny-based cryptosystems.Type: GrantFiled: December 17, 2020Date of Patent: March 26, 2024Assignee: PQSecure Technologies, LLCInventors: Brian C. Koziel, Rami El Khatib
-
Patent number: 11934797Abstract: A processor to facilitate execution of a single-precision floating point operation on an operand is disclosed. The processor includes one or more execution units, each having a plurality of floating point units to execute one or more instructions to perform the single-precision floating point operation on the operand, including performing a floating point operation on an exponent component of the operand; and performing a floating point operation on a mantissa component of the operand, comprising dividing the mantissa component into a first sub-component and a second sub-component, determining a result of the floating point operation for the first sub-component and determining a result of the floating point operation for the second sub-component, and returning a result of the floating point operation.Type: GrantFiled: April 4, 2019Date of Patent: March 19, 2024Assignee: Intel CorporationInventors: Abhishek Rhisheekesan, Shashank Lakshminarayana, Subramaniam Maiyuran
-
Patent number: 11934799Abstract: Combinatorial logic circuits with feedback, which include at least two combinatorial logic elements, are disclosed. At least one of the combinatorial logic elements receives an external input (i.e., from outside the circuit), at least one of the combinatorial logic elements receives an input that is feedback of the circuit output, and at least one of the combinatorial logic elements receives an input that is neither an external input nor an output of the circuit but rather is from another of the combinatorial logic elements and thus only “implicit” to the circuit. No staticizers are needed; the logic circuits effectively create implicit equations to perform functions that were previously thought to require sequential logic. The combinatorial logic circuits result in a stable output (in some instances after a brief period of time) due to the implicit equations, rather than achieving stability from an explicit expression of some input to the circuit.Type: GrantFiled: August 12, 2021Date of Patent: March 19, 2024Assignee: SiliconIntervention Inc.Inventor: A. Martin Mallinson
-
Patent number: 11928176Abstract: A system and method for multiplying matrices are provided. The system includes a processor coupled to a memory and a matrix multiply accelerator (MMA) coupled to the processor. The MMA is configured to multiply, based on a bitmap, a compressed first matrix and a second matrix to generate an output matrix including, for each element i,j of the output matrix, a calculation of a dot product of an ith row of the compressed first matrix and a jth column of the second matrix based on the bitmap. Or, the MMA is configured to multiply, based on the bitmap, the second matrix and the compressed first matrix and to generate the output matrix including, for each element i,j of the output matrix, a calculation of a dot product of an ith row of the second matrix and a jth column of the compressed first matrix based on the bitmap.Type: GrantFiled: November 24, 2020Date of Patent: March 12, 2024Assignee: Arm LimitedInventors: Zhi-Gang Liu, Paul Nicholas Whatmough, Matthew Mattina
-
Patent number: 11914974Abstract: According to some embodiments, a system comprises a generator of a truly random signal is connected to an input and feedback device for the purpose of providing a user with real time feedback on the random signal. The user observes a representation of the signal in the process of an external physical event for the purpose of finding a correlation between the random output and what happens during the physical event. In some examples, the system is preferably designed such the system is shielded from all classically known forces such as gravity, physical pressure, motion, electromagnetic fields, humidity, etc. and/or, such classical forces are factored out of the process as much as possible. The system is thus designed to be selectively response to signals from living creatures, in particular, humans.Type: GrantFiled: November 30, 2020Date of Patent: February 27, 2024Assignee: Psyleron, Inc.Inventors: John Valentino, Herb Mertz, Ian Cook
-
Patent number: 11907330Abstract: Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. Each cell of the matrix multiply includes: a weight matrix register configured to receive a weight input from either a transposed or a non-transposed weight shift register; a transposed weight shift register configured to receive a weight input from a horizontal direction to be stored in the weight matrix register; a non-transposed weight shift register configured to receive a weight input from a vertical direction to be stored in the weight matrix register; and a multiply unit that is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.Type: GrantFiled: February 17, 2023Date of Patent: February 20, 2024Assignee: Google LLCInventors: Andrew Everett Phelps, Norman Paul Jouppi