For Loops, E.g., Loop Buffer (epo) Patents (Class 712/E9.058)
-
Patent number: 12045621Abstract: A method for improving an accuracy of a loop branch prediction algorithm by a bypass circuit, comprising: adding a bypass circuit to a loop branch prediction algorithm; and for three pcs entering a pipeline, enabling pc1 fetched in an if0 stage to enter a hybrid branch predictor, and registering pc1; obtaining branch prediction information in an if1 stage, and making a comparison in an if2 stage to obtain a prediction result, registering the prediction result obtained in the if2 stage, and processing pc2 and pc3 in a same way.Type: GrantFiled: February 28, 2023Date of Patent: July 23, 2024Assignees: Jiangsu Huachuang Microsystem Company Limited, Nanjing Research Institute of Electronics TechnologyInventors: Jiong Lou, Shiping Li, Sibo Yang, Ming Li, Wenjun Han, Zhiyong Lei
-
Patent number: 12019553Abstract: In one embodiment, a processor includes: one or more execution circuits to execute instructions; a stream prediction circuit coupled to the one or more execution circuits, the stream prediction circuit to receive demand requests for information and, based at least in part on the demand requests, generate a page prefetch hint for a first page; and a prefetcher circuit to generate first prefetch requests each for a cache line, the stream prediction circuit decoupled from the prefetcher circuit. Other embodiments are described and claimed.Type: GrantFiled: December 22, 2020Date of Patent: June 25, 2024Assignee: Intel CorporationInventors: Hanna Alam, Joseph Nuzman
-
Patent number: 11921636Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.Type: GrantFiled: October 25, 2022Date of Patent: March 5, 2024Assignee: Texas Instruments IncorporatedInventor: Joseph Zbiciak
-
Patent number: 11868163Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array.Type: GrantFiled: March 14, 2021Date of Patent: January 9, 2024Assignee: Micron Technology, Inc.Inventor: Tony M. Brewer
-
Patent number: 11782725Abstract: A reconfigurable compute fabric can include multiple nodes, and each node can include multiple tiles with respective processing and storage elements. The tiles can be arranged in an array or grid and can be communicatively coupled. In an example, a first node can include a tile cluster of N memory-compute tiles, and the N memory-compute tiles can be coupled using a first portion of a synchronous compute fabric. Operations performed by the respective processing and storage elements of the N memory-compute tiles can be selectively enabled or disabled based on information in a mask field of data propagated through the first portion of the synchronous compute fabric.Type: GrantFiled: August 16, 2021Date of Patent: October 10, 2023Assignee: Micron Technology, Inc.Inventors: Bryan Hornung, Skyler Arron Windh
-
Patent number: 11709672Abstract: The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.Type: GrantFiled: December 16, 2019Date of Patent: July 25, 2023Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTDInventors: Yao Zhang, Bingrui Wang
-
Patent number: 11609760Abstract: The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.Type: GrantFiled: September 3, 2018Date of Patent: March 21, 2023Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTDInventors: Yao Zhang, Bingrui Wang
-
Patent number: 11442862Abstract: Disclosed herein are system, method, and computer program product embodiments for performing fair prefetching. An embodiment operates by splitting a data vector into a first subrange and a second subrange. The embodiment performs a first chance prefetch operation on the first subrange based on a fixed number of pages, thereby loading a set of pages of the first subrange into a main memory. The embodiment performs the first chance prefetch operation on the second subrange based on the fixed number of pages, thereby loading a first set of pages of the second subrange into the main memory. The embodiment performs a second chance prefetch operation on the second subrange based on the performing the first chance prefetch operation on the second subrange, thereby loading a second set of pages of the second subrange into the main memory. The embodiment then executes the query.Type: GrantFiled: April 16, 2020Date of Patent: September 13, 2022Assignee: SAP SEInventors: Robert Schulze, Adrian Dragusanu, Anup Ghatage, Colin Florendo, Mihnea Andrei, Randall Hammon, Sarika Iyer, Simhachala Sasikanth Gottapu, Yanhong Wang
-
Patent number: 11392383Abstract: Examples of the present disclosure relate to an apparatus comprising execution circuitry to execute instructions defining data processing operations on data items. The apparatus comprises cache storage to store temporary copies of the data items. The apparatus comprises prefetching circuitry to a) predict that a data item will be subject to the data processing operations by the execution circuitry by determining that the data item is consistent with an extrapolation of previous data item retrieval by the execution circuitry, and identifying that at least one control flow element of the instructions indicates that the data item will be subject to the data processing operations by the execution circuitry; and b) prefetch the data item into the cache storage.Type: GrantFiled: March 14, 2019Date of Patent: July 19, 2022Assignee: Arm LimitedInventors: Ian Michael Caulfield, Peter Richard Greenhalgh, Frederic Claude Marie Piry, Albin Pierrick Tonnerre
-
Patent number: 10761820Abstract: A parallelization assistant tool system to assist in parallelization of a computer program is disclosed. The system directs the execution of instrumented code of the computer program to collect performance statistics information relating to execution of loops within the computer program. The system provides a user interface for presenting to a programmer the performance statistics information collected for a loop within the computer program so that the programmer can prioritize efforts to parallelize the computer program. The system generates inlined source code of a loop by aggressively inlining functions substantially without regard to compilation performance, execution performance, or both. The system analyzes the inlined source code to determine the data-sharing attributes of the variables of the loop. The system may generate compiler directives to specify the data-sharing attributes of the variables.Type: GrantFiled: December 22, 2015Date of Patent: September 1, 2020Assignee: Cray, Inc.Inventors: Heidi Poxon, John Levesque, Luiz DeRose, Brian H. Johnson
-
Patent number: 9026769Abstract: A processor for processing loop instructions can include an instruction reorder structure and a loop processing controller. The instruction reorder structure is configured to store decoded instructions according to program order and issue the decoded instructions for execution out of program order. The loop processing controller is configured to detect a loop in the decoded instructions stored in the instruction reorder structure and cause the instruction reorder structure to reissue the decoded instructions that form the loop for re-execution.Type: GrantFiled: January 24, 2012Date of Patent: May 5, 2015Assignee: Marvell International Ltd.Inventors: Sujat Jamil, R. Frank O'Bleness, Joseph Delgross, Tom Hameenanttila
-
Patent number: 7877585Abstract: One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge during execution of a conditional control instruction. A disable mask allows for the use of conditional return and break instructions in a multithreaded SIMD architecture. Additional control instructions are used to set up thread processing target addresses for synchronization, breaks, and returns.Type: GrantFiled: August 27, 2007Date of Patent: January 25, 2011Assignee: NVIDIA CorporationInventors: Brett W. Coon, John R. Nickolls, John Erik Lindholm, Svetoslav D. Tzvetkov