Patents Examined by Keith E Vicary

Speculative buffer for speculative memory accesses with entries tagged with execution context identifiers

Patent number: 11210102

Abstract: An apparatus comprises processing circuitry to execute instructions from one or more of a plurality of execution contexts each associated with a respective execution context identifier; a cache; and a speculative buffer. Control circuitry controls allocation of data to the cache and the speculative buffer. A speculative entry, for which allocation is caused by a speculative memory access associated with a given execution context, is allocated to the speculative buffer instead of to the cache while the speculatively executed memory access instruction remains speculative. The speculative entry specifies, as a tagged execution context identifier, the execution context identifier associated with the given execution context. Presence of the speculative entry in the speculative buffer is prevented from being observable to execution contexts other than the execution context identified by the tagged execution context identifier.

Type: Grant

Filed: November 26, 2019

Date of Patent: December 28, 2021

Assignee: Arm Limited

Inventor: Roko Grubisic
Instruction length based parallel instruction demarcator

Patent number: 11204768

Abstract: Instruction length based parallel instruction demarcators and methods for parallel instruction demarcation are included, wherein an instruction sequence is received at an instruction buffer, the instruction sequence comprising a plurality of instruction syllables, and the instruction sequence is stored at the instruction buffer. It is determined, using one or more logic blocks arranged in a sequence, a length of instructions and at least one boundary. Additionally, using a controlling logic block, the sequence is demarcated into individual instructions.

Type: Grant

Filed: August 12, 2020

Date of Patent: December 21, 2021

Inventor: Sitaram Yadavalli
Processor instruction specifying indexed storage region holding control data for swizzle operation

Patent number: 11188331

Abstract: A data processing system includes: a processor; a data interface for communication with a control unit, the processor being on one side of the data interface; internal storage accessible by the processor, the internal storage being on the same side of the data interface as the processor; and a register array accessible by the processor and comprising a plurality of registers, each register having a plurality of vector lanes. The storage is arranged to store control data indicating an ordered selection of vector lanes of one or more of the registers. The processor is arranged to, in response to receiving instruction data from a control unit, perform a swizzle operation in which data is selected from one or more source registers in the register array, and transferred to a destination register. The data is selected from vector lanes in accordance with control data stored in the internal storage.

Type: Grant

Filed: September 19, 2019

Date of Patent: November 30, 2021

Assignees: Arm Limited, Apical Limited

Inventors: Daren Croxford, Michel Patrick Gabriel Emil Iwaniec, Rune Holm, Diego Lopez Recas
Circular reconfiguration for reconfigurable parallel processor using a plurality of memory ports coupled to a commonly accessible memory unit

Patent number: 11182335

Abstract: Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of reconfigurable units that may include a plurality of processing elements (PEs) and a plurality of memory ports (MPs) for the plurality of PEs to access a memory unit. Each of the plurality of reconfigurable units may comprise a configuration buffer and a reconfiguration counter. The processor may further comprise a sequencer coupled to the configuration buffer of each of the plurality of reconfigurable units and configured to distribute a plurality of configurations to the plurality of reconfigurable units for the plurality of PEs and the plurality of MPs to execute a sequence of instructions.

Type: Grant

Filed: July 17, 2020

Date of Patent: November 23, 2021

Assignee: AZURENGINE TECHNOLOGIES ZHUHAI INC.

Inventors: Jianbin Zhu, Yuan Li
Reconfigurable parallel processing with a temporary data storage coupled to a plurality of processing elements (PES) to store a PE execution result to be used by a PE during a next PE configuration

Patent number: 11182336

Abstract: Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.

Type: Grant

Filed: July 17, 2020

Date of Patent: November 23, 2021

Assignee: AZURENGINE TECHNOLOGIES ZHUHAI INC.

Inventors: Yuan Li, Jianbin Zhu
Private memory access for reconfigurable parallel processor using a plurality of memory ports each comprising an address calculation unit

Patent number: 11182333

Abstract: Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) and a plurality of memory ports (MPs) for the plurality of PEs to access a memory unit. Each PE may have a plurality of arithmetic logic units (ALUs) that are configured to execute a same instruction in parallel threads. Each of the plurality of MPs may comprise an address calculation unit configured to generate respective memory addresses for each thread to access a different memory bank in the memory unit.

Type: Grant

Filed: June 19, 2020

Date of Patent: November 23, 2021

Assignee: AZURENGINE TECHNOLOGIES ZHUHAI INC.

Inventors: Yuan Li, Jianbin Zhu
Shared memory access for reconfigurable parallel processor using a plurality of memory ports each comprising an address calculation unit

Patent number: 11182334

Abstract: Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) each having a plurality of arithmetic logic units (ALUs) that are configured to execute a same instruction in parallel threads and a plurality of memory ports (MPs) for the plurality of PEs to access a memory unit. Each of the plurality of MPs may comprise an address calculation unit configured to generate respective memory addresses for each thread to access a common area in the memory unit.

Type: Grant

Filed: July 16, 2020

Date of Patent: November 23, 2021

Assignee: AZURENGINE TECHNOLOGIES ZHUHAI INC.

Inventors: Jianbin Zhu, Yuan Li
Reconfigurable parallel processing with various reconfigurable units to form two or more physical data paths and routing data from one physical data path to a gasket memory to be used in a future physical data path as input

Patent number: 11176085

Abstract: Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.

Type: Grant

Filed: July 17, 2020

Date of Patent: November 16, 2021

Assignee: AZURENGINE TECHNOLOGIES ZHUHAI INC.

Inventors: Yuan Li, Jianbin Zhu
Barrier-free atomic transfer of multiword data

Patent number: 11157330

Abstract: A barrier-free atomic transfer method of multiword data is described. In the barrier-free method, a producer processor deconstructs an original parameter set of data into a deconstructed parameter set; and performs a series of single-copy-atomic writes to a series of single-copy-atomic locations. Each single-copy-atomic location in the series of single-copy-atomic locations comprises a portion of the deconstructed parameter set and a sequence number. A consumer processor can read the series of single-copy-atomic locations; verifies that the sequence number for each single-copy-atomic location in the series of single-copy-atomic locations is consistent (e.g., are all the same sequence number); and reconstructs the portions of deconstructed parameter set into the original parameter set.

Type: Grant

Filed: May 15, 2019

Date of Patent: October 26, 2021

Assignee: ARM LIMITED

Inventor: Alasdair Grant
Architecture and programming in a parallel processing environment with a tiled processor having a direct memory access controller

Patent number: 11157428

Abstract: An integrated circuit includes a plurality of tiles. Each tile includes a processor, a switch including switching circuitry to forward data over data paths from other tiles to the processor and to switches of other tiles, and a switch memory that stores instruction streams that are able to operate independently for respective output ports of the switch. Also disclosed is a direct memory access (DMA) scheme in which sizes of DMA transfers are limited according to whether a cache miss has occurred.

Type: Grant

Filed: February 14, 2014

Date of Patent: October 26, 2021

Assignee: Massachusetts Institute of Technology

Inventor: Anant Agarwal
Non-cached loads and stores in a system having a multi-threaded, self-scheduling processor

Patent number: 11157286

Abstract: Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute instructions; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In a representative embodiment, the processor core is further adapted to execute a non-cached load instruction to designate a general purpose register rather than a data cache for storage of data received from a memory circuit. The core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, and to generate one or more work descriptor data packets to another circuit for execution of corresponding execution threads.

Type: Grant

Filed: April 30, 2019

Date of Patent: October 26, 2021

Assignee: Micron Technology, Inc.

Inventor: Tony M. Brewer
Computer architecture with fixed program dataflow elements and stream processor

Patent number: 11151077

Abstract: A hardware accelerator for computers combines a stand-alone, high-speed, fixed program dataflow functional element with a stream processor, the latter of which may autonomously access memory in predefined access patterns after receiving simple stream instructions and provide them to the dataflow functional element. The result is a compact, high-speed processor that may exploit fixed program dataflow functional elements.

Type: Grant

Filed: June 28, 2017

Date of Patent: October 19, 2021

Assignee: Wisconsin Alumni Research Foundation

Inventors: Karthikeyan Sankaralingam, Anthony Nowatzki, Vinay Gangadhar
Providing hints to an execution unit to prepare for predicted subsequent arithmetic operations

Patent number: 11150721

Abstract: A system and method are described for providing hints to a processing unit that subsequent operations are likely. Responsively, the processing unit takes steps to prepare for the likely subsequent operations. Where the hints are more likely than not to be correct, the processing unit operates more efficiently. For example, in an embodiment, the processing unit consumes less power. In another embodiment, subsequent operations are performed more quickly because the processing unit is prepared to efficiently handle the subsequent operations.

Type: Grant

Filed: November 7, 2012

Date of Patent: October 19, 2021

Assignee: NVIDIA Corporation

Inventors: David Conrad Tannenbaum, Ming Y. Siu, Stuart F Oberman, Colin Sprinkle, Srinivasan Iyer, Ian Chi Yan Kwong
Branch destination prediction based on accord or discord of previous load data from a data cache line corresponding to a load instruction and present load data

Patent number: 11126435

Abstract: A processor device capable of raising a hit rate of branch destination prediction is provided. Every time a load instruction to a data cache is generated, an equivalent value judgment circuit judges accord/disaccord of present load data and previous load data from a corresponding line. In an N bit region, as history records, a judgment history record circuit records judgment results of N times by the equivalent value judgment circuit before a conditional branch instruction is generated. When the conditional branch instruction is generated, based on the history records in the N bit region, a branch prediction circuit predicts the same branch destination as the previous branch destination obtained by a previous execution result of the conditional branch instruction or a branch destination different from the previous destination. Further, the branch prediction circuit issues an instruction fetch direction of the predicted branch destination to a processor main-body circuit.

Type: Grant

Filed: March 8, 2018

Date of Patent: September 21, 2021

Assignee: RENESAS ELECTRONICS CORPORATION

Inventor: Masanao Sasai
Automated concurrency and repetition with minimal syntax

Patent number: 11113064

Abstract: A processor core receives a request to execute application code including a trigger instruction and an instruction block that reads a row of data values from a data structure and outputs a data value from a function using the row as input. The data structure is divided into multiple portions and the trigger instruction indicates that multiple instances of the instruction block are to be executed concurrently. In response to the request and to identification of the instruction block and trigger instruction, the processor core generates multiple instances of a support block that causes independent repetitive execution of each instance of the instruction block until all rows of the corresponding portion of the data structure are used as input. The processor core assigns instances of the instruction and support blocks to multiple processor cores, and provides each instance of the instruction block with the corresponding portion of the data structure.

Type: Grant

Filed: November 27, 2020

Date of Patent: September 7, 2021

Assignee: SAS INSTITUTE INC.

Inventors: Jack Joseph Rouse, Robert William Pratt, Jared Carl Erickson, Manoj Keshavmurthi Chari
Apparatus and method for maintaining prediction performance metrics for prediction components for each of a plurality of execution regions and implementing a prediction adjustment action based thereon

Patent number: 11099852

Abstract: An example apparatus comprises instruction execution circuitry and fetch circuitry to fetch, from memory, instructions for execution by the instruction execution circuitry. The fetch circuitry comprises a plurality of prediction components, each prediction component being configured to predict instructions in anticipation of the predicted instructions being required for execution by the instruction execution circuitry. The fetch circuitry is configured to fetch instructions in dependence on the predicting. The apparatus further comprises prediction tracking circuitry to maintain, for each of a plurality of execution regions, a prediction performance metric for each prediction component. The fetch circuitry is configured, based on at least one of the prediction performance metrics for a given execution region, to implement a prediction adjustment action in respect of at least one of the prediction components.

Type: Grant

Filed: October 25, 2018

Date of Patent: August 24, 2021

Assignee: ARM LIMITIED

Inventors: Francisco João Feliciano Gaspar, Mohammadi Shabbirhussain Bharmal
Method for reducing fetch cycles for return-type instructions

Patent number: 11099849

Abstract: An apparatus includes a branch target cache configured to store one or more branch addresses, a memory configured to store a return target stack, and a circuit. The circuit may be configured to determine, for a group of one or more fetched instructions, a prediction value indicative of whether the group includes a return instruction. In response to the prediction value indicating that the group includes a return instruction, the circuit may be further configured to select a return address from the return target stack. The circuit may also be configured to determine a hit or miss indication in the branch target cache for the group, and to, in response to receiving a miss indication from the branch target cache, select the return address as a target address for the return instruction.

Type: Grant

Filed: September 1, 2016

Date of Patent: August 24, 2021

Assignee: Oracle International Corporation

Inventors: Yuan Chou, Manish Shah, Richa Aggarwal
Methods for partially preserving a branch predictor state

Patent number: 11093249

Abstract: In an embodiment, an apparatus includes a plurality of memories configured to store respective data in a plurality of branch prediction entries. Each branch prediction entry corresponds to at least one of a plurality of branch instructions. The apparatus also includes a control circuit configured to store first data associated with a first branch instruction into a corresponding branch prediction entry in at least one memory of the plurality of memories. The control circuit is further configured to select a first memory of the plurality of memories, to disconnect the first memory from a power supply in response to a detection of a first power mode signal, and to cease storing data in the plurality of memories in response to the detection of the first power mode signal.

Type: Grant

Filed: March 4, 2019

Date of Patent: August 17, 2021

Assignee: Apple Inc.

Inventors: Conrado Blasco, Brett S. Feero, David Williamson, Ian D. Kountanis, Shih-Chieh Wen
System having a hybrid threading processor, a hybrid threading fabric having configurable computing elements, and a hybrid interconnection network

Patent number: 11093251

Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. In a representative embodiment, a system includes an interconnection network, a processor, a host interface, and a configurable circuit cluster. The configurable circuit cluster may include a plurality of configurable circuits arranged in an array; an asynchronous packet network and a synchronous network coupled to each configurable circuit of the array; and a memory interface circuit and a dispatch interface circuit coupled to the asynchronous packet network and to the interconnection network. Each configurable circuit includes instruction or configuration memories for selection of a current data path configuration, a master synchronous network input, and a data path configuration for a next configurable circuit.

Type: Grant

Filed: October 31, 2018

Date of Patent: August 17, 2021

Assignee: Micron Technology, Inc.

Inventor: Tony M. Brewer
Prefetch queue allocation protection bubble in a processor

Patent number: 11093248

Abstract: A computer system, processor, and method for processing information is disclosed that includes allocating a prefetch stream; providing a protection bubble to a plurality of cachelines for the allocated prefetch stream; accessing a cacheline; and preventing allocation of a different prefetch stream if the accessed cacheline is within the protection bubble. The system, processor and method in an aspect further includes providing a safety zone to a plurality of cachelines for the allocated prefetch stream, and advancing the prefetch stream if the accessed cacheline is one of the plurality of cachelines in the safety zone. In an embodiment, the number of cachelines within the safety zone is less than the number of cachelines in the protection bubble.

Type: Grant

Filed: September 10, 2018

Date of Patent: August 17, 2021

Assignee: International Business Machines Corporation

Inventors: Vivek Britto, Mohit Karve, George W. Rohrbaugh, III, Brian W. Thompto

prev 1 2 3 4 5 6 7 8 9 next