Patents Examined by Eric Coleman
  • Patent number: 11966738
    Abstract: A technology for flushing a translation lookaside buffer (TLB) according to a designated key identification code (designated key ID). An instruction of an instruction set architecture is proposed to flush the TLB according to the designated key ID. A decoder transforms the instruction into at least one microinstruction. According to a flushing microinstruction included in the at least one microinstruction, a designated key ID is supplied to a control logic circuit of the TLB through a memory order buffer, so that the control logic circuit flushes matched entries in the TLB, wherein the matched entries match the designated key ID.
    Type: Grant
    Filed: October 14, 2022
    Date of Patent: April 23, 2024
    Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.
    Inventors: Weilin Wang, Yingbing Guan, Yue Qin
  • Patent number: 11947487
    Abstract: Methods and systems are disclosed for performing dataflow execution by an accelerated processing unit (APU). Techniques disclosed include decoding information from one or more dataflow instructions. The decoded information is associated with dataflow execution of a computational task. Techniques disclosed further include configuring, based on the decoded information, dataflow circuitry, and, then, executing the dataflow execution of the computational task using the dataflow circuitry.
    Type: Grant
    Filed: June 28, 2022
    Date of Patent: April 2, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Johnathan Robert Alsop, Karthik Ramu Sangaiah, Anthony T. Gutierrez
  • Patent number: 11947967
    Abstract: An example system implementing a processing-in-memory pipeline includes: a memory array to store a plurality of look-up tables (LUTs) and data; a control block coupled to the memory array, the control block to control a computational pipeline by activating one or more LUTs of the plurality of LUTs; and a logic array coupled to the memory array and the control block, the logic array to perform, based on control inputs received from the control block, logic operations on the activated LUTs and the data.
    Type: Grant
    Filed: August 1, 2022
    Date of Patent: April 2, 2024
    Inventor: Dmitri Yudanov
  • Patent number: 11947960
    Abstract: Certain aspects of the present disclosure provide techniques and apparatus for performing mathematical operations on processing units based on data in the modulo space. An example method includes receiving a binary-space input to process (e.g., using a neural network or other processing system). The binary-space input is converted into a modulo-space input based on a set of coprimes defined for executing operations in a modulo space. A modulo-space result is generated through one or more modulo-space multiply-and-accumulate (MAC) units based on the modulo-space input. The modulo-space result is converted into a binary-space result, and the binary-space result is output.
    Type: Grant
    Filed: November 4, 2022
    Date of Patent: April 2, 2024
    Assignee: QUALCOMM Incorporated
    Inventors: Edwin Chongwoo Park, Ravishankar Sivalingam
  • Patent number: 11947961
    Abstract: According to some example embodiments of the present disclosure, in a method for a memory lookup mechanism in a high-bandwidth memory system, the method includes: using a memory die to conduct a multiplication operation using a lookup table (LUT) methodology by accessing a LUT, which includes floating point operation results, stored on the memory die; sending, by the memory die, a result of the multiplication operation to a logic die including a processor and a buffer; and conducting, by the logic die, a matrix multiplication operation using computation units.
    Type: Grant
    Filed: November 30, 2022
    Date of Patent: April 2, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Peng Gu, Krishna T. Malladi, Hongzhong Zheng
  • Patent number: 11934482
    Abstract: A processing device includes a two-dimensional array of processing elements, each processing element including an arithmetic logic unit to perform an operation. The device further includes interconnections among the two-dimensional array of processing elements to provide direct communication among neighboring processing elements of the two-dimensional array of processing elements. A processing element of the two-dimensional array of processing elements is connected to a first neighbor processing element that is immediately adjacent the processing element in a first dimension of the two-dimensional array. The processing element is further connected to a second neighbor processing element that is immediately adjacent the processing element in a second dimension of the two-dimensional array.
    Type: Grant
    Filed: August 3, 2023
    Date of Patent: March 19, 2024
    Assignee: UNTETHER AI CORPORATION
    Inventor: William Martin Snelgrove
  • Patent number: 11934828
    Abstract: A method for accessing stored entities (SEs) that are stored in a storage unit of a storage system, the method may include determining in a cyclic manner, by each compute node (CN) of a group of compute nodes, CN SEs budgets to be used in a cycle, based on a shared storage space that stores performance requests of Ces of the group.
    Type: Grant
    Filed: November 29, 2022
    Date of Patent: March 19, 2024
    Assignee: VAST DATA LTD.
    Inventors: Ron Mandel, Mirit Shalem
  • Patent number: 11928465
    Abstract: A system and an accelerator circuit including a register file comprising instruction registers to store an instruction for evaluating an elementary function, and data registers comprising a first data register to store an input value. The accelerator circuit further includes a successive cumulative rotation circuit comprising a reconfigurable inner stage to perform a successive cumulative rotation recurrence, and a determination circuit to determine a type of the elementary function based on the instruction, and responsive to determining that the input value is a fixed-point number, configure the reconfigurable inner stage to a configuration for evaluating the type of the elementary function, wherein the successive cumulative rotation circuit is to calculate an evaluation of the elementary function using the reconfigurable inner stage performing the successive cumulative rotation recurrence.
    Type: Grant
    Filed: February 20, 2020
    Date of Patent: March 12, 2024
    Inventors: Mayan Moudgill, Pablo Balzola, Murugappan Senthivelan, Vaidyanathan Ramdurai, Sitij Agrawal
  • Patent number: 11921667
    Abstract: A reconfigurable computing chip, a method for configuring the reconfigurable computing chip, a method for convolution process, a device for convolution process, a computer readable storage medium and a computer program product are provided. The reconfigurable computing chip comprises a processing module including multiple processing cores sharing a first cache, wherein each of the plurality of processing cores includes multiple processing elements sharing a second cache, each of the plurality of processing elements monopolizes a third cache corresponding to said processing element, wherein the reconfigurable computing chip is dynamically configured to perform convolution process on an input feature map and a convolution kernel to obtain an output feature map, and each of the multiple processing elements is dynamically configured to perform a multiplication-plus-addition process on a part of the input feature map and a part of the convolution kernel to obtain a part of the output feature map.
    Type: Grant
    Filed: December 8, 2022
    Date of Patent: March 5, 2024
    Assignee: BEIJING ESWIN COMPUTING TECHNOLOGY CO., LTD.
    Inventor: Yang Huang
  • Patent number: 11921636
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.
    Type: Grant
    Filed: October 25, 2022
    Date of Patent: March 5, 2024
    Assignee: Texas Instruments Incorporated
    Inventor: Joseph Zbiciak
  • Patent number: 11915005
    Abstract: A data processing apparatus includes receive circuitry that receives an indication of a trigger block of instructions.
    Type: Grant
    Filed: October 5, 2022
    Date of Patent: February 27, 2024
    Assignee: Arm Limited
    Inventors: Chang Joo Lee, Michael Brian Schinzler, Yasuo Ishii, Sergio Schuler
  • Patent number: 11914548
    Abstract: A computing device determines a node traversal order for computing a computational parameter value for each node of a data model of a system that includes a plurality of disconnected graphs. The data model represents a flow of a computational parameter value through the nodes from a source module to an end module. A flow list defines an order for selecting and iteratively processing each node to compute the computational parameter value in a single iteration through the flow list. Each node from the flow list is selected to compute a driver quantity for each node. Each node is selected from the flow list in a reverse order to compute a driver rate and the computational parameter value for each node. The driver quantity or the computational parameter value is output for each node to predict a performance of the system.
    Type: Grant
    Filed: June 8, 2023
    Date of Patent: February 27, 2024
    Assignee: SAS Institute Inc.
    Inventor: Shyam Kashinath Khatkale
  • Patent number: 11915001
    Abstract: A neural processor and a method for fetching instructions thereof are provided. The neural processor includes a local memory in which weights, input activations, and partial sums are stored, a processing unit configured to compute the weights, the input activations, and the partial sums, and a local memory load unit configured to load the weights, the input activations, and the partial sums from the local memory into the processing unit, wherein the local memory load unit includes an instruction fetch unit configured to fetch instructions included in a program of the local memory load unit for loading any one of the weights, the input activations, or the partial sums from the local memory, and an instruction execution unit configured to generate control signals for executing instructions fetched by the instruction fetch unit.
    Type: Grant
    Filed: September 28, 2023
    Date of Patent: February 27, 2024
    Assignee: Rebellions Inc.
    Inventor: Minhoo Kang
  • Patent number: 11900108
    Abstract: A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.
    Type: Grant
    Filed: August 30, 2021
    Date of Patent: February 13, 2024
    Assignee: Intel Corporation
    Inventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Bret L. Toll, Maxim Loktyukhin, Mark C. Davis, Alexandre J. Farcy
  • Patent number: 11899613
    Abstract: A packaging technology to improve performance of an AI processing system resulting in an ultra-high bandwidth system. An IC package is provided which comprises: a substrate; a first die on the substrate, and a second die stacked over the first die. The first die can be a first logic die (e.g., a compute chip, CPU, GPU, etc.) while the second die can be a compute chiplet comprising ferroelectric or paraelectric logic. Both dies can include ferroelectric or paraelectric logic. The ferroelectric/paraelectric logic may include AND gates, OR gates, complex gates, majority, minority, and/or threshold gates, sequential logic, etc. The IC package can be in a 3D or 2.5D configuration that implements logic-on-logic stacking configuration. The 3D or 2.5D packaging configurations have chips or chiplets designed to have time distributed or spatially distributed processing. The logic of chips or chiplets is segregated so that one chip in a 3D or 2.5D stacking arrangement is hot at a time.
    Type: Grant
    Filed: August 20, 2021
    Date of Patent: February 13, 2024
    Assignee: KEPLER COMPUTING INC.
    Inventors: Amrita Mathuriya, Christopher B. Wilkerson, Rajeev Kumar Dokania, Debo Olaosebikan, Sasikanth Manipatruni
  • Patent number: 11886985
    Abstract: A processor-implemented data processing method includes: generating compressed data of first matrix data based on information of a distance between valid elements included in the first matrix data; fetching second matrix data based on the compressed data; and generating output matrix data based on the compressed data and the second matrix data.
    Type: Grant
    Filed: July 28, 2022
    Date of Patent: January 30, 2024
    Assignees: Samsung Electronics Co., Ltd., Seoul National University R&DB Foundation
    Inventors: Yuhwan Ro, Byeongho Kim, Jaehyun Park, Jungho Ahn, Minbok Wi, Sunjung Lee, Eojin Lee, Wonkyung Jung, Jongwook Chung, Jaewan Choi
  • Patent number: 11880684
    Abstract: Provided are a Reduced Instruction Set Computer-Five (RISC-V)-based artificial intelligence inference method and system. The RISC-V-based artificial intelligence inference method includes the following steps: acquiring an instruction and data of artificial intelligence inference by means of a Direct Memory Access (DMA) interface, and writing the instruction and the data into a memory; acquiring the instruction from the memory and translating the instruction, and loading the data from the memory to a corresponding register on the basis of the instruction; in response to the instruction being a vector instruction, processing, by a convolution control unit, corresponding vector data in a vector processing unit on the basis of the vector instruction; and feeding back the processed vector data to complete inference.
    Type: Grant
    Filed: September 30, 2021
    Date of Patent: January 23, 2024
    Assignee: INSPUR SUZHOU INTELLIGENT TECHNOLOGY CO., LTD.
    Inventor: Zhaorong Jia
  • Patent number: 11880682
    Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reduced input can include a reduced input data element and/or a reduced weight. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide the reduced input to the array. In order to reduce the bit-length, the reducer may reduce the number of trailing bits of the input. Further, the systolic array can receive a reduced and rounded input. The systolic array can propagate the reduced input through the processing elements in the systolic array. Each processing element may include a multiplier and/or an adder to perform arithmetical operations based on the reduced input.
    Type: Grant
    Filed: June 30, 2021
    Date of Patent: January 23, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Paul Gilbert Meyer, Thomas A Volpe, Ron Diamant, Joshua Wayne Bowman, Nishith Desai, Thomas Elmer
  • Patent number: 11880683
    Abstract: Systems, apparatuses, and methods for efficiently processing arithmetic operations are disclosed. A computing system includes a processor capable of executing single precision mathematical instructions on data sizes of M bits and half precision mathematical instructions on data sizes of N bits, which is less than M bits. At least two source operands with M bits indicated by a received instruction are read from a register file. If the instruction is a packed math instruction, at least a first source operand with a size of N bits less than M bits is selected from either a high portion or a low portion of one of the at least two source operands read from the register file. The instruction includes fields storing bits, each bit indicating the high portion or the low portion of a given source operand associated with a register identifier specified elsewhere in the instruction.
    Type: Grant
    Filed: October 31, 2017
    Date of Patent: January 23, 2024
    Assignee: Advanced micro devices, inc.
    Inventors: Jiasheng Chen, Bin He, Yunxiao Zou, Michael J. Mantor, Radhakrishna Giduthuri, Eric J. Finger, Brian D. Emberling
  • Patent number: 11874793
    Abstract: The present disclosure relates generally to multi-processor arrangements and, more particularly, to broadcast hubs for multi-processor arrangements. A processing tile may comprise a broadcast hub to obtain a plurality of parameters applicable in a particular operation from at least one of a plurality of processing tiles and initiate distribution of the plurality of parameters to the plurality of processing tiles, wherein the plurality of processing tiles may execute the particular operation based at least in part on the plurality of distributed parameters.
    Type: Grant
    Filed: March 30, 2022
    Date of Patent: January 16, 2024
    Assignee: Arm Limited
    Inventors: Erik Persson, Graeme Leslie Ingram, Rune Holm, John Wakefield Brothers, III