For Loops, E.g., Loop Buffer (epo) Patents (Class 712/E9.058)
  • Patent number: 11921636
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.
    Type: Grant
    Filed: October 25, 2022
    Date of Patent: March 5, 2024
    Assignee: Texas Instruments Incorporated
    Inventor: Joseph Zbiciak
  • Patent number: 11868163
    Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array.
    Type: Grant
    Filed: March 14, 2021
    Date of Patent: January 9, 2024
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer
  • Patent number: 11782725
    Abstract: A reconfigurable compute fabric can include multiple nodes, and each node can include multiple tiles with respective processing and storage elements. The tiles can be arranged in an array or grid and can be communicatively coupled. In an example, a first node can include a tile cluster of N memory-compute tiles, and the N memory-compute tiles can be coupled using a first portion of a synchronous compute fabric. Operations performed by the respective processing and storage elements of the N memory-compute tiles can be selectively enabled or disabled based on information in a mask field of data propagated through the first portion of the synchronous compute fabric.
    Type: Grant
    Filed: August 16, 2021
    Date of Patent: October 10, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Bryan Hornung, Skyler Arron Windh
  • Patent number: 11709672
    Abstract: The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.
    Type: Grant
    Filed: December 16, 2019
    Date of Patent: July 25, 2023
    Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD
    Inventors: Yao Zhang, Bingrui Wang
  • Patent number: 11609760
    Abstract: The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.
    Type: Grant
    Filed: September 3, 2018
    Date of Patent: March 21, 2023
    Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD
    Inventors: Yao Zhang, Bingrui Wang
  • Patent number: 11442862
    Abstract: Disclosed herein are system, method, and computer program product embodiments for performing fair prefetching. An embodiment operates by splitting a data vector into a first subrange and a second subrange. The embodiment performs a first chance prefetch operation on the first subrange based on a fixed number of pages, thereby loading a set of pages of the first subrange into a main memory. The embodiment performs the first chance prefetch operation on the second subrange based on the fixed number of pages, thereby loading a first set of pages of the second subrange into the main memory. The embodiment performs a second chance prefetch operation on the second subrange based on the performing the first chance prefetch operation on the second subrange, thereby loading a second set of pages of the second subrange into the main memory. The embodiment then executes the query.
    Type: Grant
    Filed: April 16, 2020
    Date of Patent: September 13, 2022
    Assignee: SAP SE
    Inventors: Robert Schulze, Adrian Dragusanu, Anup Ghatage, Colin Florendo, Mihnea Andrei, Randall Hammon, Sarika Iyer, Simhachala Sasikanth Gottapu, Yanhong Wang
  • Patent number: 11392383
    Abstract: Examples of the present disclosure relate to an apparatus comprising execution circuitry to execute instructions defining data processing operations on data items. The apparatus comprises cache storage to store temporary copies of the data items. The apparatus comprises prefetching circuitry to a) predict that a data item will be subject to the data processing operations by the execution circuitry by determining that the data item is consistent with an extrapolation of previous data item retrieval by the execution circuitry, and identifying that at least one control flow element of the instructions indicates that the data item will be subject to the data processing operations by the execution circuitry; and b) prefetch the data item into the cache storage.
    Type: Grant
    Filed: March 14, 2019
    Date of Patent: July 19, 2022
    Assignee: Arm Limited
    Inventors: Ian Michael Caulfield, Peter Richard Greenhalgh, Frederic Claude Marie Piry, Albin Pierrick Tonnerre
  • Patent number: 10761820
    Abstract: A parallelization assistant tool system to assist in parallelization of a computer program is disclosed. The system directs the execution of instrumented code of the computer program to collect performance statistics information relating to execution of loops within the computer program. The system provides a user interface for presenting to a programmer the performance statistics information collected for a loop within the computer program so that the programmer can prioritize efforts to parallelize the computer program. The system generates inlined source code of a loop by aggressively inlining functions substantially without regard to compilation performance, execution performance, or both. The system analyzes the inlined source code to determine the data-sharing attributes of the variables of the loop. The system may generate compiler directives to specify the data-sharing attributes of the variables.
    Type: Grant
    Filed: December 22, 2015
    Date of Patent: September 1, 2020
    Assignee: Cray, Inc.
    Inventors: Heidi Poxon, John Levesque, Luiz DeRose, Brian H. Johnson
  • Patent number: 9026769
    Abstract: A processor for processing loop instructions can include an instruction reorder structure and a loop processing controller. The instruction reorder structure is configured to store decoded instructions according to program order and issue the decoded instructions for execution out of program order. The loop processing controller is configured to detect a loop in the decoded instructions stored in the instruction reorder structure and cause the instruction reorder structure to reissue the decoded instructions that form the loop for re-execution.
    Type: Grant
    Filed: January 24, 2012
    Date of Patent: May 5, 2015
    Assignee: Marvell International Ltd.
    Inventors: Sujat Jamil, R. Frank O'Bleness, Joseph Delgross, Tom Hameenanttila
  • Patent number: 7877585
    Abstract: One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge during execution of a conditional control instruction. A disable mask allows for the use of conditional return and break instructions in a multithreaded SIMD architecture. Additional control instructions are used to set up thread processing target addresses for synchronization, breaks, and returns.
    Type: Grant
    Filed: August 27, 2007
    Date of Patent: January 25, 2011
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, John R. Nickolls, John Erik Lindholm, Svetoslav D. Tzvetkov