Patents by Inventor Gregory Michael WRIGHT

Gregory Michael WRIGHT has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Speculative transitions among modes with different privilege levels in a block-based microarchitecture

Patent number: 11269640

Abstract: The disclosure relates to processing in-flight blocks in a processor pipeline according to an expected execution mode to reduce synchronization delays that could otherwise arise due to transitions among processor modes with varying privilege levels (e.g., user mode, supervisor mode, hypervisor mode, etc.). More particularly, a program counter associated with an instruction block to be fetched may be translated to one or more execute permissions associated with the instruction block and the instruction block may be associated with a speculative execution mode based at least in part on the one or more execute permissions. Accordingly, the instruction block may be processed relative to the speculative execution mode while in-flight within the processor pipeline.

Type: Grant

Filed: February 13, 2017

Date of Patent: March 8, 2022

Assignee: Qualcomm Incorporated

Inventor: Gregory Michael Wright
Replay of partially executed instruction blocks in a processor-based system employing a block-atomic execution model

Patent number: 11188336

Abstract: Replay of partially executed instruction blocks in a processor-based system employing a block-atomic execution model is disclosed. In one aspect, a partial replay controller is provided in a processor(s) of a central processing unit (CPU). If an instruction is detected in the instruction block associated with a potential architectural state modification, or an exception occurs during execution of instructions, the instruction block is re-executed. During re-execution of the instruction block, the partial replay controller is configured to record produced results from load/store instructions. Thus, if an exception occurs during re-execution of the instruction block, previously recorded produced results for the executed load/store instructions before the exception occurred are replayed during re-execution of the instruction block after the exception is resolved.

Type: Grant

Filed: August 31, 2016

Date of Patent: November 30, 2021

Assignee: Qualcomm Incorporated

Inventor: Gregory Michael Wright
Event counter circuits using partitioned moving average determinations and related methods

Patent number: 11070211

Abstract: An event counter circuit can be configured to monitor operation of a system where a moving average register circuit can be configured to store a moving average value updated in each cycle of operation of the system by adding a number of system events occurring during a current cycle of the system operation to either 1) a current moving average value stored in the moving average register circuit or 2) a keep value generated by partitioning the current moving average value into the keep value and a transfer value representing system events not included in a determination of the moving average value for subsequent cycles of operation of the system.

Type: Grant

Filed: May 29, 2020

Date of Patent: July 20, 2021

Assignee: Syzexion, Inc.

Inventor: Gregory Michael Wright
Providing multi-element multi-vector (MEMV) register file access in vector-processor-based devices

Patent number: 11048509

Abstract: Providing multi-element multi-vector (MEMV) register file access in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device includes a vector processor comprising multiple processing elements (PEs) communicatively coupled via a corresponding plurality of channels to a vector register file comprising a plurality of memory banks. The vector processor provides a direct memory access (DMA) controller that is configured to receive a plurality of vectors that each comprise a plurality of vector elements representing operands for processing a loop iteration. The DMA controller arranges the vectors in the vector register file such that, for each group of vectors to be accessed in parallel, vector elements for each vector are stored consecutively, but corresponding vector elements of consecutive vectors are stored in different memory banks of the vector register file.

Type: Grant

Filed: June 5, 2018

Date of Patent: June 29, 2021

Inventors: Hadi Parandeh Afshar, Amrit Panda, Eric Rotenberg, Gregory Michael Wright
Providing predictive instruction dispatch throttling to prevent resource overflows in out-of-order processor (OOP)-based devices

Patent number: 10929139

Abstract: Providing predictive instruction dispatch throttling to prevent resource overflow in out-of-order processor (OOP)-based devices is disclosed. An OOP-based device includes a system resource that may be consumed or otherwise occupied by instructions, as well as an execution pipeline comprising a decode stage and a dispatch stage. The OOP further maintains a running count and a resource usage threshold. Upon receiving an instruction block, the decode stage extracts a proxy value that indicates an approximate predicted count of instructions within the instruction block that will consume a system resource. The decode stage then increments the running count by the proxy value. The dispatch stage compares the running count to the resource usage threshold before dispatching any younger instruction blocks. If the running count exceeds the resource usage threshold, the dispatch stage blocks dispatching of younger instruction blocks until the running count no longer exceeds the resource usage threshold.

Type: Grant

Filed: September 27, 2018

Date of Patent: February 23, 2021

Assignee: Qualcomm Incorporated

Inventors: Lisa Ru-feng Hsu, Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
EVENT COUNTER CIRCUITS USING PARTITIONED MOVING AVERAGE DETERMINATIONS AND RELATED METHODS

Publication number: 20200382123

Abstract: An event counter circuit can be configured to monitor operation of a system where a moving average register circuit can be configured to store a moving average value updated in each cycle of operation of the system by adding a number of system events occurring during a current cycle of the system operation to either 1) a current moving average value stored in the moving average register circuit or 2) a keep value generated by partitioning the current moving average value into the keep value and a transfer value representing system events not included in a determination of the moving average value for subsequent cycles of operation of the system

Type: Application

Filed: May 29, 2020

Publication date: December 3, 2020

Inventor: Gregory Michael Wright
Providing reconfigurable fusion of processing elements (PEs) in vector-processor-based devices

Patent number: 10846260

Abstract: Providing reconfigurable fusion of processing elements (PEs) in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device provides a vector processor including a plurality of PEs and a decode/control circuit. The decode/control circuit receives an instruction block containing a vectorizable loop comprising a loop body. The decode/control circuit determines how many PEs of the plurality of PEs are required to execute the loop body, and reconfigures the plurality of PEs into one or more fused PEs, each including the determined number of PEs required to execute the loop body. The plurality of PEs, reconfigured into one or more fused PEs, then executes one or more loop iterations of the loop body. Some aspects further include a PE communications link interconnecting the plurality of PEs, to enable communications between PEs of a fused PE and communications of inter-iteration data dependencies between PEs without requiring vector register file access operations.

Type: Grant

Filed: July 5, 2018

Date of Patent: November 24, 2020

Assignee: Qualcomm Incorporated

Inventors: Hadi Parandeh Afshar, Amrit Panda, Eric Rotenberg, Gregory Michael Wright
Deadlock free resource management in block based computing architectures

Patent number: 10783011

Abstract: Systems and methods are directed to efficient management of processor resources, particularly General Purpose Registers (GPRs), for example to minimize pipeline flushes prevent deadlocks by counting GPRs instead of allocating them to specific blocks of code. Blocks of code are allowed to execute if the Free GPRs count is adequate. The method contemplates counting the number of Register Writers in blocks of code which will write to GPRs which are in process of executing, and counting the GPRs which are available instead of merely allocating them to dedicated use by a block of code, or an instruction in a block of code. Because blocks do not run if there is not enough GPRs available for the block, deadlocks and pipeline flushes due to lack of resources can be minimized.

Type: Grant

Filed: September 21, 2017

Date of Patent: September 22, 2020

Assignee: Qualcomm Incorporated

Inventors: Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
Providing variable interpretation of usefulness indicators for memory tables in processor-based systems

Patent number: 10725782

Abstract: Providing variable interpretation of usefulness indicators for memory tables in processor-based systems is disclosed. In one aspect, a memory system comprises a memory table providing multiple memory table entries, each including a usefulness indicator. A memory controller of the memory system comprises a global polarity indicator representing how the usefulness indicator for each memory table entry is interpreted and updated by the memory controller. If the global polarity indicator is set, the memory controller interprets a value of each usefulness indicator as directly corresponding to the usefulness of the corresponding memory table entry. Conversely, if the global polarity indicator is not set, the polarity is reversed such that the memory controller interprets the usefulness indicator value as inversely corresponding to the usefulness of the corresponding memory table entry.

Type: Grant

Filed: September 12, 2017

Date of Patent: July 28, 2020

Assignee: Qualcomm Incorporated

Inventors: Anil Krishna, Yongseok Yi, Eric Rotenberg, Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
Providing memory dependence prediction in block-atomic dataflow architectures

Patent number: 10684859

Abstract: Providing memory dependence prediction in block-atomic dataflow architectures is provided, in one aspect, la a memory dependence prediction circuit. The memory dependence prediction circuit comprises a predictor table configured to store multiple predictor table entries, each comprising a store instruction identifier, a block reach set, and a load set. Using this data, the memory dependence prediction circuit determines, upon a fetch of an instruction block by an execution pipeline, whether the instruction block contains store instructions that reach dependent load instructions. If so, the store instructions are marked as having dependent load instructions to wake. In some aspects, the memory dependence prediction circuit is configured to determine whether the instruction block contains dependent load instructions reached by store instructions. If so, the memory dependence prediction circuit delays execution of the dependent load instructions.

Type: Grant

Filed: September 19, 2016

Date of Patent: June 16, 2020

Assignee: QUALCOMM Incorporated

Inventors: Chen-Han Ho, Gregory Michael Wright
Enabling parallel memory accesses by providing explicit affine instructions in vector-processor-based devices

Patent number: 10628162

Abstract: Enabling parallel memory accesses by providing explicit affine instructions in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device implementing a block-based dataflow instruction set architecture (ISA) includes a decoder circuit configured to provide an affine instruction that specifies a base parameter indicating a base value B, a stride parameter indicating a stride interval value S, and a count parameter indicating a count value C. The decoder circuit of the vector-processor-based device decodes the affine instruction, and generates an output stream comprising one or more output values, wherein a count of the output values of the output stream equals the count value C. Using an index X where 0?X<C, each Xth output value in the output stream is generated as a sum of the base value B and a product of the stride interval value S and the index X.

Type: Grant

Filed: June 19, 2018

Date of Patent: April 21, 2020

Assignee: Qualcomm Incorporated

Inventors: Amrit Panda, Eric Rotenberg, Hadi Parandeh Afshar, Gregory Michael Wright
PROVIDING PREDICTIVE INSTRUCTION DISPATCH THROTTLING TO PREVENT RESOURCE OVERFLOWS IN OUT-OF-ORDER PROCESSOR (OOP)-BASED DEVICES

Publication number: 20200104163

Abstract: Providing predictive instruction dispatch throttling to prevent resource overflow in out-of-order processor (OOP)-based devices is disclosed. In this regard, an OOP-based device includes a system resource that may be consumed or otherwise occupied by instructions, as well as an execution pipeline comprising a decode stage and a dispatch stage. The OOP further maintains a running count and a resource usage threshold. Upon receiving an instruction block, the decode stage extracts a proxy value that indicates an approximate predicted count of instructions within the instruction block that will consume a system resource. The decode stage then increments the running count by the proxy value. The dispatch stage compares the running count to the resource usage threshold before dispatching any younger instruction blocks. If the running count exceeds the resource usage threshold, the dispatch stage blocks dispatching of younger instruction blocks until the running count no longer exceeds the resource usage threshold.

Type: Application

Filed: September 27, 2018

Publication date: April 2, 2020

Inventors: Lisa Ru-feng Hsu, Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
PROVIDING EFFICIENT HANDLING OF BRANCH DIVERGENCE IN VECTORIZABLE LOOPS BY VECTOR-PROCESSOR-BASED DEVICES

Publication number: 20200065098

Abstract: Providing efficient handling of branch divergence in vectorizable loops by vector-processor-based devices is disclosed. In some aspects, a vector-processor-based device provides a plurality of processing elements (PEs) coupled to a scheduler circuit comprising a clock cycle threshold and a mask register comprising a plurality of bits corresponding to a plurality of loop iterations of a vectorizable loop to be executed. The scheduler circuit initiates a first execution interval, during which loop iterations of the vectorizable loop are assigned to PEs for parallel execution. If a loop iteration's execution time exceeds the clock cycle threshold, the scheduler circuit sets a mask register bit corresponding to the loop iteration indicating that the loop iteration is incomplete, and defers its execution.

Type: Application

Filed: August 21, 2018

Publication date: February 27, 2020

Inventors: Hadi Parandeh Afshar, Eric Rotenberg, Gregory Michael Wright
PROVIDING RECONFIGURABLE FUSION OF PROCESSING ELEMENTS (PEs) IN VECTOR-PROCESSOR-BASED DEVICES

Publication number: 20200012618

Abstract: Providing reconfigurable fusion of processing elements (PEs) in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device provides a vector processor including a plurality of PEs and a decode/control circuit. The decode/control circuit receives an instruction block containing a vectorizable loop comprising a loop body. The decode/control circuit determines how many PEs of the plurality of PEs are required to execute the loop body, and reconfigures the plurality of PEs into one or more fused PEs, each including the determined number of PEs required to execute the loop body. The plurality of PEs, reconfigured into one or more fused PEs, then executes one or more loop iterations of the loop body. Some aspects further include a PE communications link interconnecting the plurality of PEs, to enable communications between PEs of a fused PE and communications of inter-iteration data dependencies between PEs without requiring vector register file access operations.

Type: Application

Filed: July 5, 2018

Publication date: January 9, 2020

Inventors: Hadi Parandeh Afshar, Amrit Panda, Eric Rotenberg, Gregory Michael Wright
ENABLING PARALLEL MEMORY ACCESSES BY PROVIDING EXPLICIT AFFINE INSTRUCTIONS IN VECTOR-PROCESSOR-BASED DEVICES

Publication number: 20190384606

Abstract: Enabling parallel memory accesses by providing explicit affine instructions in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device implementing a block-based dataflow instruction set architecture (ISA) includes a decoder circuit configured to provide an affine instruction that specifies a base parameter indicating a base value B, a stride parameter indicating a stride interval value S, and a count parameter indicating a count value C. The decoder circuit of the vector-processor-based device decodes the affine instruction, and generates an output stream comprising one or more output values, wherein a count of the output values of the output stream equals the count value C. Using an index X where 0?X<C, each Xth output value in the output stream is generated as a sum of the base value B and a product of the stride interval value S and the index X.

Type: Application

Filed: June 19, 2018

Publication date: December 19, 2019

Inventors: Amrit Panda, Eric Rotenberg, Hadi Parandeh Afshar, Gregory Michael Wright
PROVIDING MULTI-ELEMENT MULTI-VECTOR (MEMV) REGISTER FILE ACCESS IN VECTOR-PROCESSOR-BASED DEVICES

Publication number: 20190369994

Abstract: Providing multi-element multi-vector (MEMV) register file access in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device includes a vector processor comprising multiple processing elements (PEs) communicatively coupled via a corresponding plurality of channels to a vector register file comprising a plurality of memory banks. The vector processor provides a direct memory access (DMA) controller that is configured to receive a plurality of vectors that each comprise a plurality of vector elements representing operands for processing a loop iteration. The DMA controller arranges the vectors in the vector register file such that, for each group of vectors to be accessed in parallel, vector elements for each vector are stored consecutively, but corresponding vector elements of consecutive vectors are stored in different memory banks of the vector register file.

Type: Application

Filed: June 5, 2018

Publication date: December 5, 2019

Inventors: Hadi Parandeh Afshar, Amrit Panda, Eric Rotenberg, Gregory Michael Wright
DEADLOCK FREE RESOURCE MANAGEMENT IN BLOCK BASED COMPUTING ARCHITECTURES

Publication number: 20190087241

Abstract: Systems and methods are directed to efficient management of processor resources, particularly General Purpose Registers (GPRs), for example to minimize pipeline flushes prevent deadlocks by counting GPRs instead of allocating them to specific blocks of code. Blocks of code are allowed to execute if the Free GPRs count is adequate. The method contemplates counting the number of Register Writers in blocks of code which will write to GPRs which are in process of executing, and counting the GPRs which are available instead of merely allocating them to dedicated use by a block of code, or an instruction in a block of code. Because blocks do not run if there is not enough GPRs available for the block, deadlocks and pipeline flushes due to lack of resources can be minimized.

Type: Application

Filed: September 21, 2017

Publication date: March 21, 2019

Inventors: Vignyan Reddy KOTHINTI NARESH, Gregory Michael WRIGHT
PROVIDING VARIABLE INTERPRETATION OF USEFULNESS INDICATORS FOR MEMORY TABLES IN PROCESSOR-BASED SYSTEMS

Publication number: 20190079772

Abstract: Providing variable interpretation of usefulness indicators for memory tables in processor-based systems is disclosed. In one aspect, a memory system comprises a memory table providing multiple memory table entries, each including a usefulness indicator. A memory controller of the memory system comprises a global polarity indicator representing how the usefulness indicator for each memory table entry is interpreted and updated by the memory controller. If the global polarity indicator is set, the memory controller interprets a value of each usefulness indicator as directly corresponding to the usefulness of the corresponding memory table entry. Conversely, if the global polarity indicator is not set, the polarity is reversed such that the memory controller interprets the usefulness indicator value as inversely corresponding to the usefulness of the corresponding memory table entry.

Type: Application

Filed: September 12, 2017

Publication date: March 14, 2019

Inventors: Anil Krishna, Yongseok Yi, Eric Rotenberg, Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
CACHING INSTRUCTION BLOCK HEADER DATA IN BLOCK ARCHITECTURE PROCESSOR-BASED SYSTEMS

Publication number: 20190065060

Abstract: Caching instruction block header data in block architecture processor-based systems is disclosed. In one aspect, a computer processor device, based on a block architecture, provides an instruction block header cache dedicated to caching instruction block header data. Upon a subsequent fetch of an instruction block, cached instruction block header data may be retrieved from the instruction block header cache (if present) and used to optimize processing of the instruction block. In some aspects, the instruction block header data may include a microarchitectural block header (MBH) generated upon the first decoding of the instruction block by an MBH generation circuit. The MBH may contain static or dynamic information about the instructions within the instruction block. As non-limiting examples, the information may include data relating to register reads and writes, load and store operations, branch information, predicate information, special instructions, and/or serial execution preferences.

Type: Application

Filed: August 28, 2017

Publication date: February 28, 2019

Inventors: Anil Krishna, Gregory Michael Wright, Yongseok Yi, Matthew Gilbert, Vignyan Reddy Kothinti Naresh
SELECTIVE REFRESH MECHANISM FOR DRAM

Publication number: 20190013062

Abstract: Systems and methods for selective refresh of a cache, such as a last-level cache implemented as an embedded DRAM (eDRAM). A refresh bit and a reuse bit are associated with each way of at least one set of the cache. A least recently used (LRU) stack tracks positions of the ways, with positions towards a most recently used position of a threshold comprising more recently used positions and positions towards a least recently used position of the threshold comprise less recently used positions. A line in a way is selectively refreshed if the position of the way is one of the more recently used positions and if the refresh bit associated with the way is set, or the position of the way is one of the less recently used positions and if the refresh bit and the reuse bit associated with the way are both set.

Type: Application

Filed: July 7, 2017

Publication date: January 10, 2019

Inventors: Francois Ibrahim ATALLAH, Gregory Michael WRIGHT, Shivam PRIYADARSHI, Garrett Michael DRAPALA, Harold Wade CAIN, III, Erik HEDBERG

1 2 next