Patents Examined by Eric Coleman
-
Patent number: 11966738Abstract: A technology for flushing a translation lookaside buffer (TLB) according to a designated key identification code (designated key ID). An instruction of an instruction set architecture is proposed to flush the TLB according to the designated key ID. A decoder transforms the instruction into at least one microinstruction. According to a flushing microinstruction included in the at least one microinstruction, a designated key ID is supplied to a control logic circuit of the TLB through a memory order buffer, so that the control logic circuit flushes matched entries in the TLB, wherein the matched entries match the designated key ID.Type: GrantFiled: October 14, 2022Date of Patent: April 23, 2024Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.Inventors: Weilin Wang, Yingbing Guan, Yue Qin
-
Patent number: 11947487Abstract: Methods and systems are disclosed for performing dataflow execution by an accelerated processing unit (APU). Techniques disclosed include decoding information from one or more dataflow instructions. The decoded information is associated with dataflow execution of a computational task. Techniques disclosed further include configuring, based on the decoded information, dataflow circuitry, and, then, executing the dataflow execution of the computational task using the dataflow circuitry.Type: GrantFiled: June 28, 2022Date of Patent: April 2, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Johnathan Robert Alsop, Karthik Ramu Sangaiah, Anthony T. Gutierrez
-
Patent number: 11947967Abstract: An example system implementing a processing-in-memory pipeline includes: a memory array to store a plurality of look-up tables (LUTs) and data; a control block coupled to the memory array, the control block to control a computational pipeline by activating one or more LUTs of the plurality of LUTs; and a logic array coupled to the memory array and the control block, the logic array to perform, based on control inputs received from the control block, logic operations on the activated LUTs and the data.Type: GrantFiled: August 1, 2022Date of Patent: April 2, 2024Inventor: Dmitri Yudanov
-
Patent number: 11947960Abstract: Certain aspects of the present disclosure provide techniques and apparatus for performing mathematical operations on processing units based on data in the modulo space. An example method includes receiving a binary-space input to process (e.g., using a neural network or other processing system). The binary-space input is converted into a modulo-space input based on a set of coprimes defined for executing operations in a modulo space. A modulo-space result is generated through one or more modulo-space multiply-and-accumulate (MAC) units based on the modulo-space input. The modulo-space result is converted into a binary-space result, and the binary-space result is output.Type: GrantFiled: November 4, 2022Date of Patent: April 2, 2024Assignee: QUALCOMM IncorporatedInventors: Edwin Chongwoo Park, Ravishankar Sivalingam
-
Patent number: 11947961Abstract: According to some example embodiments of the present disclosure, in a method for a memory lookup mechanism in a high-bandwidth memory system, the method includes: using a memory die to conduct a multiplication operation using a lookup table (LUT) methodology by accessing a LUT, which includes floating point operation results, stored on the memory die; sending, by the memory die, a result of the multiplication operation to a logic die including a processor and a buffer; and conducting, by the logic die, a matrix multiplication operation using computation units.Type: GrantFiled: November 30, 2022Date of Patent: April 2, 2024Assignee: Samsung Electronics Co., Ltd.Inventors: Peng Gu, Krishna T. Malladi, Hongzhong Zheng
-
Patent number: 11934482Abstract: A processing device includes a two-dimensional array of processing elements, each processing element including an arithmetic logic unit to perform an operation. The device further includes interconnections among the two-dimensional array of processing elements to provide direct communication among neighboring processing elements of the two-dimensional array of processing elements. A processing element of the two-dimensional array of processing elements is connected to a first neighbor processing element that is immediately adjacent the processing element in a first dimension of the two-dimensional array. The processing element is further connected to a second neighbor processing element that is immediately adjacent the processing element in a second dimension of the two-dimensional array.Type: GrantFiled: August 3, 2023Date of Patent: March 19, 2024Assignee: UNTETHER AI CORPORATIONInventor: William Martin Snelgrove
-
Patent number: 11934828Abstract: A method for accessing stored entities (SEs) that are stored in a storage unit of a storage system, the method may include determining in a cyclic manner, by each compute node (CN) of a group of compute nodes, CN SEs budgets to be used in a cycle, based on a shared storage space that stores performance requests of Ces of the group.Type: GrantFiled: November 29, 2022Date of Patent: March 19, 2024Assignee: VAST DATA LTD.Inventors: Ron Mandel, Mirit Shalem
-
Patent number: 11928465Abstract: A system and an accelerator circuit including a register file comprising instruction registers to store an instruction for evaluating an elementary function, and data registers comprising a first data register to store an input value. The accelerator circuit further includes a successive cumulative rotation circuit comprising a reconfigurable inner stage to perform a successive cumulative rotation recurrence, and a determination circuit to determine a type of the elementary function based on the instruction, and responsive to determining that the input value is a fixed-point number, configure the reconfigurable inner stage to a configuration for evaluating the type of the elementary function, wherein the successive cumulative rotation circuit is to calculate an evaluation of the elementary function using the reconfigurable inner stage performing the successive cumulative rotation recurrence.Type: GrantFiled: February 20, 2020Date of Patent: March 12, 2024Inventors: Mayan Moudgill, Pablo Balzola, Murugappan Senthivelan, Vaidyanathan Ramdurai, Sitij Agrawal
-
Patent number: 11921667Abstract: A reconfigurable computing chip, a method for configuring the reconfigurable computing chip, a method for convolution process, a device for convolution process, a computer readable storage medium and a computer program product are provided. The reconfigurable computing chip comprises a processing module including multiple processing cores sharing a first cache, wherein each of the plurality of processing cores includes multiple processing elements sharing a second cache, each of the plurality of processing elements monopolizes a third cache corresponding to said processing element, wherein the reconfigurable computing chip is dynamically configured to perform convolution process on an input feature map and a convolution kernel to obtain an output feature map, and each of the multiple processing elements is dynamically configured to perform a multiplication-plus-addition process on a part of the input feature map and a part of the convolution kernel to obtain a part of the output feature map.Type: GrantFiled: December 8, 2022Date of Patent: March 5, 2024Assignee: BEIJING ESWIN COMPUTING TECHNOLOGY CO., LTD.Inventor: Yang Huang
-
Patent number: 11921636Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.Type: GrantFiled: October 25, 2022Date of Patent: March 5, 2024Assignee: Texas Instruments IncorporatedInventor: Joseph Zbiciak
-
Patent number: 11915005Abstract: A data processing apparatus includes receive circuitry that receives an indication of a trigger block of instructions.Type: GrantFiled: October 5, 2022Date of Patent: February 27, 2024Assignee: Arm LimitedInventors: Chang Joo Lee, Michael Brian Schinzler, Yasuo Ishii, Sergio Schuler
-
Patent number: 11914548Abstract: A computing device determines a node traversal order for computing a computational parameter value for each node of a data model of a system that includes a plurality of disconnected graphs. The data model represents a flow of a computational parameter value through the nodes from a source module to an end module. A flow list defines an order for selecting and iteratively processing each node to compute the computational parameter value in a single iteration through the flow list. Each node from the flow list is selected to compute a driver quantity for each node. Each node is selected from the flow list in a reverse order to compute a driver rate and the computational parameter value for each node. The driver quantity or the computational parameter value is output for each node to predict a performance of the system.Type: GrantFiled: June 8, 2023Date of Patent: February 27, 2024Assignee: SAS Institute Inc.Inventor: Shyam Kashinath Khatkale
-
Patent number: 11915001Abstract: A neural processor and a method for fetching instructions thereof are provided. The neural processor includes a local memory in which weights, input activations, and partial sums are stored, a processing unit configured to compute the weights, the input activations, and the partial sums, and a local memory load unit configured to load the weights, the input activations, and the partial sums from the local memory into the processing unit, wherein the local memory load unit includes an instruction fetch unit configured to fetch instructions included in a program of the local memory load unit for loading any one of the weights, the input activations, or the partial sums from the local memory, and an instruction execution unit configured to generate control signals for executing instructions fetched by the instruction fetch unit.Type: GrantFiled: September 28, 2023Date of Patent: February 27, 2024Assignee: Rebellions Inc.Inventor: Minhoo Kang
-
Patent number: 11900108Abstract: A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.Type: GrantFiled: August 30, 2021Date of Patent: February 13, 2024Assignee: Intel CorporationInventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Bret L. Toll, Maxim Loktyukhin, Mark C. Davis, Alexandre J. Farcy
-
Patent number: 11899613Abstract: A packaging technology to improve performance of an AI processing system resulting in an ultra-high bandwidth system. An IC package is provided which comprises: a substrate; a first die on the substrate, and a second die stacked over the first die. The first die can be a first logic die (e.g., a compute chip, CPU, GPU, etc.) while the second die can be a compute chiplet comprising ferroelectric or paraelectric logic. Both dies can include ferroelectric or paraelectric logic. The ferroelectric/paraelectric logic may include AND gates, OR gates, complex gates, majority, minority, and/or threshold gates, sequential logic, etc. The IC package can be in a 3D or 2.5D configuration that implements logic-on-logic stacking configuration. The 3D or 2.5D packaging configurations have chips or chiplets designed to have time distributed or spatially distributed processing. The logic of chips or chiplets is segregated so that one chip in a 3D or 2.5D stacking arrangement is hot at a time.Type: GrantFiled: August 20, 2021Date of Patent: February 13, 2024Assignee: KEPLER COMPUTING INC.Inventors: Amrita Mathuriya, Christopher B. Wilkerson, Rajeev Kumar Dokania, Debo Olaosebikan, Sasikanth Manipatruni
-
Patent number: 11886985Abstract: A processor-implemented data processing method includes: generating compressed data of first matrix data based on information of a distance between valid elements included in the first matrix data; fetching second matrix data based on the compressed data; and generating output matrix data based on the compressed data and the second matrix data.Type: GrantFiled: July 28, 2022Date of Patent: January 30, 2024Assignees: Samsung Electronics Co., Ltd., Seoul National University R&DB FoundationInventors: Yuhwan Ro, Byeongho Kim, Jaehyun Park, Jungho Ahn, Minbok Wi, Sunjung Lee, Eojin Lee, Wonkyung Jung, Jongwook Chung, Jaewan Choi
-
Patent number: 11880684Abstract: Provided are a Reduced Instruction Set Computer-Five (RISC-V)-based artificial intelligence inference method and system. The RISC-V-based artificial intelligence inference method includes the following steps: acquiring an instruction and data of artificial intelligence inference by means of a Direct Memory Access (DMA) interface, and writing the instruction and the data into a memory; acquiring the instruction from the memory and translating the instruction, and loading the data from the memory to a corresponding register on the basis of the instruction; in response to the instruction being a vector instruction, processing, by a convolution control unit, corresponding vector data in a vector processing unit on the basis of the vector instruction; and feeding back the processed vector data to complete inference.Type: GrantFiled: September 30, 2021Date of Patent: January 23, 2024Assignee: INSPUR SUZHOU INTELLIGENT TECHNOLOGY CO., LTD.Inventor: Zhaorong Jia
-
Patent number: 11880682Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reduced input can include a reduced input data element and/or a reduced weight. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide the reduced input to the array. In order to reduce the bit-length, the reducer may reduce the number of trailing bits of the input. Further, the systolic array can receive a reduced and rounded input. The systolic array can propagate the reduced input through the processing elements in the systolic array. Each processing element may include a multiplier and/or an adder to perform arithmetical operations based on the reduced input.Type: GrantFiled: June 30, 2021Date of Patent: January 23, 2024Assignee: Amazon Technologies, Inc.Inventors: Paul Gilbert Meyer, Thomas A Volpe, Ron Diamant, Joshua Wayne Bowman, Nishith Desai, Thomas Elmer
-
Patent number: 11880683Abstract: Systems, apparatuses, and methods for efficiently processing arithmetic operations are disclosed. A computing system includes a processor capable of executing single precision mathematical instructions on data sizes of M bits and half precision mathematical instructions on data sizes of N bits, which is less than M bits. At least two source operands with M bits indicated by a received instruction are read from a register file. If the instruction is a packed math instruction, at least a first source operand with a size of N bits less than M bits is selected from either a high portion or a low portion of one of the at least two source operands read from the register file. The instruction includes fields storing bits, each bit indicating the high portion or the low portion of a given source operand associated with a register identifier specified elsewhere in the instruction.Type: GrantFiled: October 31, 2017Date of Patent: January 23, 2024Assignee: Advanced micro devices, inc.Inventors: Jiasheng Chen, Bin He, Yunxiao Zou, Michael J. Mantor, Radhakrishna Giduthuri, Eric J. Finger, Brian D. Emberling
-
Patent number: 11874793Abstract: The present disclosure relates generally to multi-processor arrangements and, more particularly, to broadcast hubs for multi-processor arrangements. A processing tile may comprise a broadcast hub to obtain a plurality of parameters applicable in a particular operation from at least one of a plurality of processing tiles and initiate distribution of the plurality of parameters to the plurality of processing tiles, wherein the plurality of processing tiles may execute the particular operation based at least in part on the plurality of distributed parameters.Type: GrantFiled: March 30, 2022Date of Patent: January 16, 2024Assignee: Arm LimitedInventors: Erik Persson, Graeme Leslie Ingram, Rune Holm, John Wakefield Brothers, III