For Loops, E.g., Loop Buffer (epo) Patents (Class 712/E9.058)

Method for improving accuracy of loop branch prediction

Patent number: 12045621

Abstract: A method for improving an accuracy of a loop branch prediction algorithm by a bypass circuit, comprising: adding a bypass circuit to a loop branch prediction algorithm; and for three pcs entering a pipeline, enabling pc1 fetched in an if0 stage to enter a hybrid branch predictor, and registering pc1; obtaining branch prediction information in an if1 stage, and making a comparison in an if2 stage to obtain a prediction result, registering the prediction result obtained in the if2 stage, and processing pc2 and pc3 in a same way.

Type: Grant

Filed: February 28, 2023

Date of Patent: July 23, 2024

Assignees: Jiangsu Huachuang Microsystem Company Limited, Nanjing Research Institute of Electronics Technology

Inventors: Jiong Lou, Shiping Li, Sibo Yang, Ming Li, Wenjun Han, Zhiyong Lei
System, apparatus and method for prefetching physical pages in a processor

Patent number: 12019553

Abstract: In one embodiment, a processor includes: one or more execution circuits to execute instructions; a stream prediction circuit coupled to the one or more execution circuits, the stream prediction circuit to receive demand requests for information and, based at least in part on the demand requests, generate a page prefetch hint for a first page; and a prefetcher circuit to generate first prefetch requests each for a cache line, the stream prediction circuit decoupled from the prefetcher circuit. Other embodiments are described and claimed.

Type: Grant

Filed: December 22, 2020

Date of Patent: June 25, 2024

Assignee: Intel Corporation

Inventors: Hanna Alam, Joseph Nuzman
Streaming engine with flexible streaming engine template supporting differing number of nested loops with corresponding loop counts and loop offsets

Patent number: 11921636

Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.

Type: Grant

Filed: October 25, 2022

Date of Patent: March 5, 2024

Assignee: Texas Instruments Incorporated

Inventor: Joseph Zbiciak
Efficient loop execution for a multi-threaded, self-scheduling reconfigurable computing fabric

Patent number: 11868163

Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array.

Type: Grant

Filed: March 14, 2021

Date of Patent: January 9, 2024

Assignee: Micron Technology, Inc.

Inventor: Tony M. Brewer
Mask field propagation among memory-compute tiles in a reconfigurable architecture

Patent number: 11782725

Abstract: A reconfigurable compute fabric can include multiple nodes, and each node can include multiple tiles with respective processing and storage elements. The tiles can be arranged in an array or grid and can be communicatively coupled. In an example, a first node can include a tile cluster of N memory-compute tiles, and the N memory-compute tiles can be coupled using a first portion of a synchronous compute fabric. Operations performed by the respective processing and storage elements of the N memory-compute tiles can be selectively enabled or disabled based on information in a mask field of data propagated through the first portion of the synchronous compute fabric.

Type: Grant

Filed: August 16, 2021

Date of Patent: October 10, 2023

Assignee: Micron Technology, Inc.

Inventors: Bryan Hornung, Skyler Arron Windh
Computing device and method

Patent number: 11709672

Abstract: The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.

Type: Grant

Filed: December 16, 2019

Date of Patent: July 25, 2023

Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD

Inventors: Yao Zhang, Bingrui Wang
Computing device and method

Patent number: 11609760

Abstract: The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.

Type: Grant

Filed: September 3, 2018

Date of Patent: March 21, 2023

Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD

Inventors: Yao Zhang, Bingrui Wang
Fair prefetching in hybrid column stores

Patent number: 11442862

Abstract: Disclosed herein are system, method, and computer program product embodiments for performing fair prefetching. An embodiment operates by splitting a data vector into a first subrange and a second subrange. The embodiment performs a first chance prefetch operation on the first subrange based on a fixed number of pages, thereby loading a set of pages of the first subrange into a main memory. The embodiment performs the first chance prefetch operation on the second subrange based on the fixed number of pages, thereby loading a first set of pages of the second subrange into the main memory. The embodiment performs a second chance prefetch operation on the second subrange based on the performing the first chance prefetch operation on the second subrange, thereby loading a second set of pages of the second subrange into the main memory. The embodiment then executes the query.

Type: Grant

Filed: April 16, 2020

Date of Patent: September 13, 2022

Assignee: SAP SE

Inventors: Robert Schulze, Adrian Dragusanu, Anup Ghatage, Colin Florendo, Mihnea Andrei, Randall Hammon, Sarika Iyer, Simhachala Sasikanth Gottapu, Yanhong Wang
Apparatus and method for prefetching data items

Patent number: 11392383

Abstract: Examples of the present disclosure relate to an apparatus comprising execution circuitry to execute instructions defining data processing operations on data items. The apparatus comprises cache storage to store temporary copies of the data items. The apparatus comprises prefetching circuitry to a) predict that a data item will be subject to the data processing operations by the execution circuitry by determining that the data item is consistent with an extrapolation of previous data item retrieval by the execution circuitry, and identifying that at least one control flow element of the instructions indicates that the data item will be subject to the data processing operations by the execution circuitry; and b) prefetch the data item into the cache storage.

Type: Grant

Filed: March 14, 2019

Date of Patent: July 19, 2022

Assignee: Arm Limited

Inventors: Ian Michael Caulfield, Peter Richard Greenhalgh, Frederic Claude Marie Piry, Albin Pierrick Tonnerre
Assisting parallelization of a computer program

Patent number: 10761820

Abstract: A parallelization assistant tool system to assist in parallelization of a computer program is disclosed. The system directs the execution of instrumented code of the computer program to collect performance statistics information relating to execution of loops within the computer program. The system provides a user interface for presenting to a programmer the performance statistics information collected for a loop within the computer program so that the programmer can prioritize efforts to parallelize the computer program. The system generates inlined source code of a loop by aggressively inlining functions substantially without regard to compilation performance, execution performance, or both. The system analyzes the inlined source code to determine the data-sharing attributes of the variables of the loop. The system may generate compiler directives to specify the data-sharing attributes of the variables.

Type: Grant

Filed: December 22, 2015

Date of Patent: September 1, 2020

Assignee: Cray, Inc.

Inventors: Heidi Poxon, John Levesque, Luiz DeRose, Brian H. Johnson
Detecting and reissuing of loop instructions in reorder structure

Patent number: 9026769

Abstract: A processor for processing loop instructions can include an instruction reorder structure and a loop processing controller. The instruction reorder structure is configured to store decoded instructions according to program order and issue the decoded instructions for execution out of program order. The loop processing controller is configured to detect a loop in the decoded instructions stored in the instruction reorder structure and cause the instruction reorder structure to reissue the decoded instructions that form the loop for re-execution.

Type: Grant

Filed: January 24, 2012

Date of Patent: May 5, 2015

Assignee: Marvell International Ltd.

Inventors: Sujat Jamil, R. Frank O'Bleness, Joseph Delgross, Tom Hameenanttila
Structured programming control flow in a SIMD architecture

Patent number: 7877585

Abstract: One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge during execution of a conditional control instruction. A disable mask allows for the use of conditional return and break instructions in a multithreaded SIMD architecture. Additional control instructions are used to set up thread processing target addresses for synchronization, breaks, and returns.

Type: Grant

Filed: August 27, 2007

Date of Patent: January 25, 2011

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John R. Nickolls, John Erik Lindholm, Svetoslav D. Tzvetkov