Patents Examined by Andrew Caldwell
-
Patent number: 11966450Abstract: According to an embodiment, a calculation device includes a memory and one or more processors configured to update, for elements each associated with first and second variables, the first and second variables for each unit time, sequentially for the unit times and alternately between the first and second variables. In a calculation process for each unit time, the one or more processors are configured to: for each of the elements, update the first variable based on the second variable; update the second variable based on the first variables of the elements; when the first variable is smaller than a first value, change the first variable to a value of the first value or more and a threshold value or less; and when the first variable is greater than a second value, change the first variable to a value of the threshold value or more and the second value or less.Type: GrantFiled: February 25, 2021Date of Patent: April 23, 2024Assignee: Kabushiki Kaisha ToshibaInventor: Hayato Goto
-
Patent number: 11960856Abstract: A system and/or an integrated circuit including a multiplier-accumulator execution pipeline which includes a plurality of MACs to implement a plurality of multiply and accumulate operations. A first memory stores filter weights having a Gaussian floating point (“GFP”) data format and a first bit length. A data format conversion circuitry includes circuitry to convert the filter weights from the GFP data format and the first bit length to filter weights having the data format and bit length that are different from the GFP data format and the first bit length. The converted filter weights are output to the MACs, wherein in operation, the MACs are configured to perform the plurality of multiply operations using (a) the input data and (b) the filter weights having the data format and bit length that are different from the GFP data format and the first bit length, respectively.Type: GrantFiled: January 4, 2021Date of Patent: April 16, 2024Assignee: Flex Logix Technologies, Inc.Inventor: Frederick A. Ware
-
Patent number: 11775303Abstract: Disclosed is a general-purpose computing accelerator which includes a memory including an instruction cache, a first executing unit performing a first computation operation, a second executing unit performing a second computation operation, an instruction fetching unit fetching an instruction stored in the instruction cache, a decoding unit that decodes the instruction, and a state control unit controlling a path of the instruction depending on an operation state of the second executing unit. The decoding unit provides the instruction to the first executing unit when the instruction is of a first type and provides the instruction to the state control unit when the instruction is of a second type. Depending on the operation state of the second executing unit, the state control unit provides the instruction of the second type to the second executing unit or stores the instruction of the second type as a register file in the memory.Type: GrantFiled: September 1, 2021Date of Patent: October 3, 2023Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTEInventor: Jeongmin Yang
-
Patent number: 11762664Abstract: There is provided a data processing apparatus comprising decode circuitry responsive to receipt of a block of instructions to generate control signals indicative of each of the block of instructions, and to analyse the block of instructions to detect a potential hazard instruction. The data processing apparatus is provided with decode circuitry to encode information indicative of a clean restart point into the control signals associated with the potential hazard instruction. The data processing apparatus is provided with data processing circuity to perform out-of-order execution of at least some of the block of instructions, and control circuitry responsive to a determination, at execution of the potential hazard instruction, that data values used as operands for the potential hazard instruction have been modified by out-of-order execution of a subsequent instruction, to restart execution from the clean restart point and to flush held data values from the data processing circuitry.Type: GrantFiled: January 5, 2022Date of Patent: September 19, 2023Assignee: Arm LimitedInventors: Yasuo Ishii, Michael David Achenbach, David Gum Lim, Abhishek Raja
-
Patent number: 11720356Abstract: An apparatus comprises an instruction decoder to decode instructions, processing circuitry to perform data processing in response to the instructions decoded by the instruction decoder, and memory attribute checking circuitry to check whether a memory access request issued by the processing circuitry satisfies access permissions specified in a plurality of memory attribute entries. Each memory attribute entry specifies access permissions for a corresponding address region of variable size within an address space. In response to a range checking instruction specifying address identifying parameters for identifying a first address and a second address, the instruction decoder controls the processing circuitry to set, in at least one software-accessible storage location, a status value indicative of whether the first address and the second address correspond to the same memory attribute entry.Type: GrantFiled: August 20, 2019Date of Patent: August 8, 2023Assignee: Arm LimitedInventor: Thomas Christopher Grocutt
-
Patent number: 11720360Abstract: Techniques are disclosed relating to data synchronization barrier operations. A system includes a first processor that may receive a data barrier operation request from a second processor include in the system. Based on receiving that data barrier operation request from the second processor, the first processor may ensure that outstanding load/store operations executed by the first processor that are directed to addresses outside of an exclusion region have been completed. The first processor may respond to the second processor that the data barrier operation request is complete at the first processor, even in the case that one or more load/store operations that are directed to addresses within the exclusion region are outstanding and not complete when the first processor responds that the data barrier operation request is complete.Type: GrantFiled: September 8, 2021Date of Patent: August 8, 2023Assignee: Apple Inc.Inventors: Jeff Gonion, John H. Kelm, James Vash, Pradeep Kanapathipillai, Mridul Agarwal, Gideon N. Levinsky, Richard F. Russo, Christopher M. Tsay
-
Patent number: 11687347Abstract: A microprocessor and a method for issuing a load/store instruction is introduced. The microprocessor includes a decode/issue unit, a load/store queue, a scoreboard, and a load/store unit. The scoreboard includes a plurality of scoreboard entries, in which each scoreboard entry includes an unknown bit value and a count value, wherein the unknown bit value or the count value is set when instructions are issued. The decode/issue unit checks for WAR, WAW, and RAW data dependencies from the scoreboard and dispatches load/store instructions to the load/store queue with the recorded scoreboard values. The load/store queue is configured to resolve the data dependencies and dispatch the load/store instructions to the load/store unit for execution.Type: GrantFiled: May 25, 2021Date of Patent: June 27, 2023Assignee: ANDES TECHNOLOGY CORPORATIONInventor: Thang Minh Tran
-
Patent number: 11023208Abstract: A true random number generator includes a latch circuit, a noise circuit coupled to the latch circuit and an equalization circuit coupled to the inputs of the latch circuit, the equalization circuit being configured to maintain the latch circuit in a balanced state and to allow the latch circuit to resolve from a metastable state based on a timing control. A method of generating a random number output includes maintaining a latch circuit in a balanced state by turning on an equalization circuit coupled to the inputs of the latch circuit, coupling at least one noise source to the latch circuit, allowing the latch circuit to resolve from a metastable state by turning off the equalization circuit and repeatedly turning the equalization circuit on and off based on a timing control.Type: GrantFiled: January 23, 2019Date of Patent: June 1, 2021Assignee: International Business Machines CorporationInventors: Chitra K. Subramanian, Ghavam G. Shahidi
-
Patent number: 11023205Abstract: Negative zero control for execution of an instruction. A process obtains an instruction to perform operation(s) using an input value. The instruction includes a negative zero control indicator indicating whether negative zero control is enabled for execution of the instruction. The process executes the instruction, the executing including performing the operation(s) using the input value to obtain a result having a sign, determining whether to control the sign of the result, the determining being based at least in part on the negative zero control indicator being set to a defined value, and performing further processing, as part the executing the instruction, based on the determining.Type: GrantFiled: February 15, 2019Date of Patent: June 1, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Cedric Lichtenau, Reid Copeland, Petra Leber, Silvia M. Mueller, Jonathan D. Bradbury, Xin Guo
-
Patent number: 11023209Abstract: An electric device for a hardware random number generator is provided. The hardware random number generator comprises: one or more bitcells which comprise a first pair of a first transistor and a first tunable resistor and a second pair of a second transistor and a second tunable resistor, with the first pair is cross-coupled with the second pair.Type: GrantFiled: January 25, 2019Date of Patent: June 1, 2021Assignee: International Business Machines CorporationInventor: Kangguo Cheng
-
Patent number: 11018692Abstract: Computer-implemented methods, systems, and devices to perform lossless compression of floating point format time-series data are disclosed. A first data value may be obtained in floating point format representative of an initial time-series parameter. For example, an output checkpoint of a computer simulation of a real-world event such as weather prediction or nuclear reaction simulation. A first predicted value may be determined representing the parameter at a first checkpoint time. A second data value may be obtained from the simulation. A prediction error may be calculated. Another predicted value may be generated for a next point in time and may be adjusted by the previously determined prediction error (e.g., to increase accuracy of the subsequent prediction). When a third data value is obtained, the adjusted prediction value may be used to generate a difference (e.g., XOR) for storing in a compressed data store to represent the third data value.Type: GrantFiled: July 29, 2020Date of Patent: May 25, 2021Assignee: Hewlett Packard Enterprise Development LPInventors: Anirban Nag, Naveen Muralimanohar, Paolo Faraboschi
-
Patent number: 11016731Abstract: Disclosed embodiments relate to performing floating-point (FP) arithmetic. In one example, a processor is to decode an instruction specifying locations of first, second, and third floating-point (FP) operands and an opcode calling for accumulating a FP product of the first and second FP operands with the third FP operand, and execution circuitry to, in a first cycle, generate the FP product having a Fuzzy-Jbit format comprising a sign bit, a 9-bit exponent, and a 25-bit mantissa having two possible positions for a JBit and, in a second cycle, to accumulate the FP product with the third FP operand, while concurrently, based on Jbit positions of the FP product and the third FP operand, determining an exponent adjustment and a mantissa shift control of a result of the accumulation, wherein performing the exponent adjustment concurrently enhances an ability to perform the accumulation in one cycle.Type: GrantFiled: March 29, 2019Date of Patent: May 25, 2021Assignee: Intel CorporationInventors: Amit Gradstein, Simon Rubanovich, Zeev Sperber
-
Patent number: 11010635Abstract: A method for processing electronic data includes the steps of transforming the electronic data to a matrix representation including a plurality of matrices; decomposing the matrix representation into a series of matrix approximations; and processing, with an approximation process, the plurality of matrices thereby obtaining a low-rank approximation of the plurality of matrices.Type: GrantFiled: September 10, 2018Date of Patent: May 18, 2021Assignee: City University of Hong KongInventors: Hing Cheung So, Wen-Jun Zeng, Jiayi Chen, Abdelhak M. Zoubir
-
Patent number: 11010131Abstract: An integrated circuit may include a floating-point adder. The adder may be implemented using a dual-path adder architecture having a near path and a far path. The near path may include a leading zero anticipator (LZA), a comparison circuit for comparing an exponent value to an LZA count, and associated circuitry for handling subnormal numbers. The far path may include a subtraction circuit for computing the difference between a received exponent value and a minimum exponent value, at least two shifters for shifting far greater and far lesser mantissa values in parallel, and associated circuitry for handling subnormal numbers. The adder may be dynamically configured to support a first mode that processes FP16 at inputs and outputs, a second mode that processes modified FP16? inputs, and a third mode that processes FP16? at inputs and outputs.Type: GrantFiled: September 14, 2017Date of Patent: May 18, 2021Assignee: Intel CorporationInventors: Martin Langhammer, Bogdan Pasca
-
Patent number: 11010662Abstract: Massively parallel neural inference computing elements are provided. A plurality of multipliers is arranged in a plurality of equal-sized groups. Each of the plurality of multipliers is adapted to, in parallel, apply a weight to an input activation to generate an output. A plurality of adders is operatively coupled to one of the groups of multipliers. Each of the plurality of adders is adapted to, in parallel, add the outputs of the multipliers within its associated group to generate a partial sum. A plurality of function blocks is operatively coupled to one of the plurality of adders. Each of the plurality of function blocks is adapted to, in parallel, apply a function to the partial sum of its associated adder to generate an output value.Type: GrantFiled: March 4, 2020Date of Patent: May 18, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Rathinakumar Appuswamy, John V. Arthur, Andrew S. Cassidy, Pallab Datta, Steven K. Esser, Myron D. Flickner, Jennifer Klamo, Dharmendra S. Modha, Hartmut Penner, Jun Sawada, Brian Taba
-
Patent number: 11003985Abstract: Provided is a convolutional neural network system including a data selector configured to output an input value corresponding to a position of a sparse weight from among input values of input data on a basis of a sparse index indicating the position of a nonzero value in a sparse weight kernel, and a multiply-accumulate (MAC) computator configured to perform a convolution computation on the input value output from the data selector by using the sparse weight kernel.Type: GrantFiled: November 7, 2017Date of Patent: May 11, 2021Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTEInventors: Jin Kyu Kim, Byung Jo Kim, Seong Min Kim, Ju-Yeob Kim, Mi Young Lee, Joo Hyun Lee
-
Patent number: 10997277Abstract: An integrated circuit device such as a neural network accelerator can be programmed to select a numerical value based on a multinomial distribution. In various examples, the integrated circuit device can include an execution engine that includes multiple separate execution units. The multiple execution units can operate in parallel on different streams of data. For example, to make a selection based on a multinomial distribution, the execution units can be configured to perform cumulative sums on sets of numerical values, where the numerical values represent probabilities. In this example, to then obtain cumulative sums across the sets of numerical values, the largest values from the sets can be accumulated, and then added, in parallel to the sets. The resulting cumulative sum across all the numerical values can then be used to randomly select a specific index, which can provide a particular numerical value as the selected value.Type: GrantFiled: March 26, 2019Date of Patent: May 4, 2021Assignee: Amazon Technologies, Inc.Inventors: Yu Zhou, Vignesh Vivekraja, Ron Diamant
-
Patent number: 10997275Abstract: A method for an associative memory array includes storing each column of a matrix in an associated column of the associative memory array, where each bit in row j of the matrix is stored in row R-matrix-row-j of the array, storing a vector in each associated column, where a bit j from the vector is stored in an R-vector-bit-j row of the array. The method includes simultaneously activating a vector-matrix pair of rows R-vector-bit-j and R-matrix-row-j to concurrently receive a result of a Boolean function on all associated columns, using the results to calculate a product between the vector-matrix pair of rows, and writing the product to an R-product-j row in the array.Type: GrantFiled: March 23, 2017Date of Patent: May 4, 2021Assignee: GSI Technology Inc.Inventors: Avidan Akerib, Pat Lasserre
-
Patent number: 10990355Abstract: The present innovative solution solves the problem of generating pseudo-random numbers that have practically infinite period, while requiring limited processing resources and operating significantly faster that known pseudo-random number generators. A sequence of pseudo-random numbers is created by a linear congruential generator using a large seed number and the sequence is used to create a big number. The big number is formed by raising each of at least two pseudo-random numbers and their sum to the same power. The big number is then selectively split into a sequence of aperiodic pseudo-random numbers which are output for use in any suitable application and for seeding the present generator.Type: GrantFiled: July 10, 2020Date of Patent: April 27, 2021Inventor: Panagiotis Andreadakis
-
Patent number: 10984074Abstract: Disclosed embodiments relate to an accelerator for sparse-dense matrix instructions. In one example, a processor to execute a sparse-dense matrix multiplication instruction, includes fetch circuitry to fetch the sparse-dense matrix multiplication instruction having fields to specify an opcode, a dense output matrix, a dense source matrix, and a sparse source matrix having a sparsity of non-zero elements, the sparsity being less than one, decode circuitry to decode the fetched sparse-dense matrix multiplication instruction, execution circuitry to execute the decoded sparse-dense matrix multiplication instruction to, for each non-zero element at row M and column K of the specified sparse source matrix generate a product of the non-zero element and each corresponding dense element at row K and column N of the specified dense source matrix, and generate an accumulated sum of each generated product and a previous value of a corresponding output element at row M and column N of the specified dense output matrix.Type: GrantFiled: February 24, 2020Date of Patent: April 20, 2021Assignee: Intel CorporationInventors: Srinivasan Narayanamoorthy, Nadathur Rajagopalan Satish, Alexey Suprun, Kenneth J. Janik