Patents by Inventor Gregory Michael WRIGHT
Gregory Michael WRIGHT has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11269640Abstract: The disclosure relates to processing in-flight blocks in a processor pipeline according to an expected execution mode to reduce synchronization delays that could otherwise arise due to transitions among processor modes with varying privilege levels (e.g., user mode, supervisor mode, hypervisor mode, etc.). More particularly, a program counter associated with an instruction block to be fetched may be translated to one or more execute permissions associated with the instruction block and the instruction block may be associated with a speculative execution mode based at least in part on the one or more execute permissions. Accordingly, the instruction block may be processed relative to the speculative execution mode while in-flight within the processor pipeline.Type: GrantFiled: February 13, 2017Date of Patent: March 8, 2022Assignee: Qualcomm IncorporatedInventor: Gregory Michael Wright
-
Patent number: 11188336Abstract: Replay of partially executed instruction blocks in a processor-based system employing a block-atomic execution model is disclosed. In one aspect, a partial replay controller is provided in a processor(s) of a central processing unit (CPU). If an instruction is detected in the instruction block associated with a potential architectural state modification, or an exception occurs during execution of instructions, the instruction block is re-executed. During re-execution of the instruction block, the partial replay controller is configured to record produced results from load/store instructions. Thus, if an exception occurs during re-execution of the instruction block, previously recorded produced results for the executed load/store instructions before the exception occurred are replayed during re-execution of the instruction block after the exception is resolved.Type: GrantFiled: August 31, 2016Date of Patent: November 30, 2021Assignee: Qualcomm IncorporatedInventor: Gregory Michael Wright
-
Patent number: 11070211Abstract: An event counter circuit can be configured to monitor operation of a system where a moving average register circuit can be configured to store a moving average value updated in each cycle of operation of the system by adding a number of system events occurring during a current cycle of the system operation to either 1) a current moving average value stored in the moving average register circuit or 2) a keep value generated by partitioning the current moving average value into the keep value and a transfer value representing system events not included in a determination of the moving average value for subsequent cycles of operation of the system.Type: GrantFiled: May 29, 2020Date of Patent: July 20, 2021Assignee: Syzexion, Inc.Inventor: Gregory Michael Wright
-
Patent number: 11048509Abstract: Providing multi-element multi-vector (MEMV) register file access in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device includes a vector processor comprising multiple processing elements (PEs) communicatively coupled via a corresponding plurality of channels to a vector register file comprising a plurality of memory banks. The vector processor provides a direct memory access (DMA) controller that is configured to receive a plurality of vectors that each comprise a plurality of vector elements representing operands for processing a loop iteration. The DMA controller arranges the vectors in the vector register file such that, for each group of vectors to be accessed in parallel, vector elements for each vector are stored consecutively, but corresponding vector elements of consecutive vectors are stored in different memory banks of the vector register file.Type: GrantFiled: June 5, 2018Date of Patent: June 29, 2021Inventors: Hadi Parandeh Afshar, Amrit Panda, Eric Rotenberg, Gregory Michael Wright
-
Patent number: 10929139Abstract: Providing predictive instruction dispatch throttling to prevent resource overflow in out-of-order processor (OOP)-based devices is disclosed. An OOP-based device includes a system resource that may be consumed or otherwise occupied by instructions, as well as an execution pipeline comprising a decode stage and a dispatch stage. The OOP further maintains a running count and a resource usage threshold. Upon receiving an instruction block, the decode stage extracts a proxy value that indicates an approximate predicted count of instructions within the instruction block that will consume a system resource. The decode stage then increments the running count by the proxy value. The dispatch stage compares the running count to the resource usage threshold before dispatching any younger instruction blocks. If the running count exceeds the resource usage threshold, the dispatch stage blocks dispatching of younger instruction blocks until the running count no longer exceeds the resource usage threshold.Type: GrantFiled: September 27, 2018Date of Patent: February 23, 2021Assignee: Qualcomm IncorporatedInventors: Lisa Ru-feng Hsu, Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
-
Publication number: 20200382123Abstract: An event counter circuit can be configured to monitor operation of a system where a moving average register circuit can be configured to store a moving average value updated in each cycle of operation of the system by adding a number of system events occurring during a current cycle of the system operation to either 1) a current moving average value stored in the moving average register circuit or 2) a keep value generated by partitioning the current moving average value into the keep value and a transfer value representing system events not included in a determination of the moving average value for subsequent cycles of operation of the systemType: ApplicationFiled: May 29, 2020Publication date: December 3, 2020Inventor: Gregory Michael Wright
-
Patent number: 10846260Abstract: Providing reconfigurable fusion of processing elements (PEs) in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device provides a vector processor including a plurality of PEs and a decode/control circuit. The decode/control circuit receives an instruction block containing a vectorizable loop comprising a loop body. The decode/control circuit determines how many PEs of the plurality of PEs are required to execute the loop body, and reconfigures the plurality of PEs into one or more fused PEs, each including the determined number of PEs required to execute the loop body. The plurality of PEs, reconfigured into one or more fused PEs, then executes one or more loop iterations of the loop body. Some aspects further include a PE communications link interconnecting the plurality of PEs, to enable communications between PEs of a fused PE and communications of inter-iteration data dependencies between PEs without requiring vector register file access operations.Type: GrantFiled: July 5, 2018Date of Patent: November 24, 2020Assignee: Qualcomm IncorporatedInventors: Hadi Parandeh Afshar, Amrit Panda, Eric Rotenberg, Gregory Michael Wright
-
Patent number: 10783011Abstract: Systems and methods are directed to efficient management of processor resources, particularly General Purpose Registers (GPRs), for example to minimize pipeline flushes prevent deadlocks by counting GPRs instead of allocating them to specific blocks of code. Blocks of code are allowed to execute if the Free GPRs count is adequate. The method contemplates counting the number of Register Writers in blocks of code which will write to GPRs which are in process of executing, and counting the GPRs which are available instead of merely allocating them to dedicated use by a block of code, or an instruction in a block of code. Because blocks do not run if there is not enough GPRs available for the block, deadlocks and pipeline flushes due to lack of resources can be minimized.Type: GrantFiled: September 21, 2017Date of Patent: September 22, 2020Assignee: Qualcomm IncorporatedInventors: Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
-
Patent number: 10725782Abstract: Providing variable interpretation of usefulness indicators for memory tables in processor-based systems is disclosed. In one aspect, a memory system comprises a memory table providing multiple memory table entries, each including a usefulness indicator. A memory controller of the memory system comprises a global polarity indicator representing how the usefulness indicator for each memory table entry is interpreted and updated by the memory controller. If the global polarity indicator is set, the memory controller interprets a value of each usefulness indicator as directly corresponding to the usefulness of the corresponding memory table entry. Conversely, if the global polarity indicator is not set, the polarity is reversed such that the memory controller interprets the usefulness indicator value as inversely corresponding to the usefulness of the corresponding memory table entry.Type: GrantFiled: September 12, 2017Date of Patent: July 28, 2020Assignee: Qualcomm IncorporatedInventors: Anil Krishna, Yongseok Yi, Eric Rotenberg, Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
-
Patent number: 10684859Abstract: Providing memory dependence prediction in block-atomic dataflow architectures is provided, in one aspect, la a memory dependence prediction circuit. The memory dependence prediction circuit comprises a predictor table configured to store multiple predictor table entries, each comprising a store instruction identifier, a block reach set, and a load set. Using this data, the memory dependence prediction circuit determines, upon a fetch of an instruction block by an execution pipeline, whether the instruction block contains store instructions that reach dependent load instructions. If so, the store instructions are marked as having dependent load instructions to wake. In some aspects, the memory dependence prediction circuit is configured to determine whether the instruction block contains dependent load instructions reached by store instructions. If so, the memory dependence prediction circuit delays execution of the dependent load instructions.Type: GrantFiled: September 19, 2016Date of Patent: June 16, 2020Assignee: QUALCOMM IncorporatedInventors: Chen-Han Ho, Gregory Michael Wright
-
Patent number: 10628162Abstract: Enabling parallel memory accesses by providing explicit affine instructions in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device implementing a block-based dataflow instruction set architecture (ISA) includes a decoder circuit configured to provide an affine instruction that specifies a base parameter indicating a base value B, a stride parameter indicating a stride interval value S, and a count parameter indicating a count value C. The decoder circuit of the vector-processor-based device decodes the affine instruction, and generates an output stream comprising one or more output values, wherein a count of the output values of the output stream equals the count value C. Using an index X where 0?X<C, each Xth output value in the output stream is generated as a sum of the base value B and a product of the stride interval value S and the index X.Type: GrantFiled: June 19, 2018Date of Patent: April 21, 2020Assignee: Qualcomm IncorporatedInventors: Amrit Panda, Eric Rotenberg, Hadi Parandeh Afshar, Gregory Michael Wright
-
Publication number: 20200104163Abstract: Providing predictive instruction dispatch throttling to prevent resource overflow in out-of-order processor (OOP)-based devices is disclosed. In this regard, an OOP-based device includes a system resource that may be consumed or otherwise occupied by instructions, as well as an execution pipeline comprising a decode stage and a dispatch stage. The OOP further maintains a running count and a resource usage threshold. Upon receiving an instruction block, the decode stage extracts a proxy value that indicates an approximate predicted count of instructions within the instruction block that will consume a system resource. The decode stage then increments the running count by the proxy value. The dispatch stage compares the running count to the resource usage threshold before dispatching any younger instruction blocks. If the running count exceeds the resource usage threshold, the dispatch stage blocks dispatching of younger instruction blocks until the running count no longer exceeds the resource usage threshold.Type: ApplicationFiled: September 27, 2018Publication date: April 2, 2020Inventors: Lisa Ru-feng Hsu, Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
-
Publication number: 20200065098Abstract: Providing efficient handling of branch divergence in vectorizable loops by vector-processor-based devices is disclosed. In some aspects, a vector-processor-based device provides a plurality of processing elements (PEs) coupled to a scheduler circuit comprising a clock cycle threshold and a mask register comprising a plurality of bits corresponding to a plurality of loop iterations of a vectorizable loop to be executed. The scheduler circuit initiates a first execution interval, during which loop iterations of the vectorizable loop are assigned to PEs for parallel execution. If a loop iteration's execution time exceeds the clock cycle threshold, the scheduler circuit sets a mask register bit corresponding to the loop iteration indicating that the loop iteration is incomplete, and defers its execution.Type: ApplicationFiled: August 21, 2018Publication date: February 27, 2020Inventors: Hadi Parandeh Afshar, Eric Rotenberg, Gregory Michael Wright
-
Publication number: 20200012618Abstract: Providing reconfigurable fusion of processing elements (PEs) in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device provides a vector processor including a plurality of PEs and a decode/control circuit. The decode/control circuit receives an instruction block containing a vectorizable loop comprising a loop body. The decode/control circuit determines how many PEs of the plurality of PEs are required to execute the loop body, and reconfigures the plurality of PEs into one or more fused PEs, each including the determined number of PEs required to execute the loop body. The plurality of PEs, reconfigured into one or more fused PEs, then executes one or more loop iterations of the loop body. Some aspects further include a PE communications link interconnecting the plurality of PEs, to enable communications between PEs of a fused PE and communications of inter-iteration data dependencies between PEs without requiring vector register file access operations.Type: ApplicationFiled: July 5, 2018Publication date: January 9, 2020Inventors: Hadi Parandeh Afshar, Amrit Panda, Eric Rotenberg, Gregory Michael Wright
-
Publication number: 20190384606Abstract: Enabling parallel memory accesses by providing explicit affine instructions in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device implementing a block-based dataflow instruction set architecture (ISA) includes a decoder circuit configured to provide an affine instruction that specifies a base parameter indicating a base value B, a stride parameter indicating a stride interval value S, and a count parameter indicating a count value C. The decoder circuit of the vector-processor-based device decodes the affine instruction, and generates an output stream comprising one or more output values, wherein a count of the output values of the output stream equals the count value C. Using an index X where 0?X<C, each Xth output value in the output stream is generated as a sum of the base value B and a product of the stride interval value S and the index X.Type: ApplicationFiled: June 19, 2018Publication date: December 19, 2019Inventors: Amrit Panda, Eric Rotenberg, Hadi Parandeh Afshar, Gregory Michael Wright
-
Publication number: 20190369994Abstract: Providing multi-element multi-vector (MEMV) register file access in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device includes a vector processor comprising multiple processing elements (PEs) communicatively coupled via a corresponding plurality of channels to a vector register file comprising a plurality of memory banks. The vector processor provides a direct memory access (DMA) controller that is configured to receive a plurality of vectors that each comprise a plurality of vector elements representing operands for processing a loop iteration. The DMA controller arranges the vectors in the vector register file such that, for each group of vectors to be accessed in parallel, vector elements for each vector are stored consecutively, but corresponding vector elements of consecutive vectors are stored in different memory banks of the vector register file.Type: ApplicationFiled: June 5, 2018Publication date: December 5, 2019Inventors: Hadi Parandeh Afshar, Amrit Panda, Eric Rotenberg, Gregory Michael Wright
-
Publication number: 20190087241Abstract: Systems and methods are directed to efficient management of processor resources, particularly General Purpose Registers (GPRs), for example to minimize pipeline flushes prevent deadlocks by counting GPRs instead of allocating them to specific blocks of code. Blocks of code are allowed to execute if the Free GPRs count is adequate. The method contemplates counting the number of Register Writers in blocks of code which will write to GPRs which are in process of executing, and counting the GPRs which are available instead of merely allocating them to dedicated use by a block of code, or an instruction in a block of code. Because blocks do not run if there is not enough GPRs available for the block, deadlocks and pipeline flushes due to lack of resources can be minimized.Type: ApplicationFiled: September 21, 2017Publication date: March 21, 2019Inventors: Vignyan Reddy KOTHINTI NARESH, Gregory Michael WRIGHT
-
Publication number: 20190079772Abstract: Providing variable interpretation of usefulness indicators for memory tables in processor-based systems is disclosed. In one aspect, a memory system comprises a memory table providing multiple memory table entries, each including a usefulness indicator. A memory controller of the memory system comprises a global polarity indicator representing how the usefulness indicator for each memory table entry is interpreted and updated by the memory controller. If the global polarity indicator is set, the memory controller interprets a value of each usefulness indicator as directly corresponding to the usefulness of the corresponding memory table entry. Conversely, if the global polarity indicator is not set, the polarity is reversed such that the memory controller interprets the usefulness indicator value as inversely corresponding to the usefulness of the corresponding memory table entry.Type: ApplicationFiled: September 12, 2017Publication date: March 14, 2019Inventors: Anil Krishna, Yongseok Yi, Eric Rotenberg, Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
-
Publication number: 20190065060Abstract: Caching instruction block header data in block architecture processor-based systems is disclosed. In one aspect, a computer processor device, based on a block architecture, provides an instruction block header cache dedicated to caching instruction block header data. Upon a subsequent fetch of an instruction block, cached instruction block header data may be retrieved from the instruction block header cache (if present) and used to optimize processing of the instruction block. In some aspects, the instruction block header data may include a microarchitectural block header (MBH) generated upon the first decoding of the instruction block by an MBH generation circuit. The MBH may contain static or dynamic information about the instructions within the instruction block. As non-limiting examples, the information may include data relating to register reads and writes, load and store operations, branch information, predicate information, special instructions, and/or serial execution preferences.Type: ApplicationFiled: August 28, 2017Publication date: February 28, 2019Inventors: Anil Krishna, Gregory Michael Wright, Yongseok Yi, Matthew Gilbert, Vignyan Reddy Kothinti Naresh
-
Publication number: 20190013062Abstract: Systems and methods for selective refresh of a cache, such as a last-level cache implemented as an embedded DRAM (eDRAM). A refresh bit and a reuse bit are associated with each way of at least one set of the cache. A least recently used (LRU) stack tracks positions of the ways, with positions towards a most recently used position of a threshold comprising more recently used positions and positions towards a least recently used position of the threshold comprise less recently used positions. A line in a way is selectively refreshed if the position of the way is one of the more recently used positions and if the refresh bit associated with the way is set, or the position of the way is one of the less recently used positions and if the refresh bit and the reuse bit associated with the way are both set.Type: ApplicationFiled: July 7, 2017Publication date: January 10, 2019Inventors: Francois Ibrahim ATALLAH, Gregory Michael WRIGHT, Shivam PRIYADARSHI, Garrett Michael DRAPALA, Harold Wade CAIN, III, Erik HEDBERG