Patents by Inventor Gregory Michael WRIGHT

Gregory Michael WRIGHT has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11269640
    Abstract: The disclosure relates to processing in-flight blocks in a processor pipeline according to an expected execution mode to reduce synchronization delays that could otherwise arise due to transitions among processor modes with varying privilege levels (e.g., user mode, supervisor mode, hypervisor mode, etc.). More particularly, a program counter associated with an instruction block to be fetched may be translated to one or more execute permissions associated with the instruction block and the instruction block may be associated with a speculative execution mode based at least in part on the one or more execute permissions. Accordingly, the instruction block may be processed relative to the speculative execution mode while in-flight within the processor pipeline.
    Type: Grant
    Filed: February 13, 2017
    Date of Patent: March 8, 2022
    Assignee: Qualcomm Incorporated
    Inventor: Gregory Michael Wright
  • Patent number: 11188336
    Abstract: Replay of partially executed instruction blocks in a processor-based system employing a block-atomic execution model is disclosed. In one aspect, a partial replay controller is provided in a processor(s) of a central processing unit (CPU). If an instruction is detected in the instruction block associated with a potential architectural state modification, or an exception occurs during execution of instructions, the instruction block is re-executed. During re-execution of the instruction block, the partial replay controller is configured to record produced results from load/store instructions. Thus, if an exception occurs during re-execution of the instruction block, previously recorded produced results for the executed load/store instructions before the exception occurred are replayed during re-execution of the instruction block after the exception is resolved.
    Type: Grant
    Filed: August 31, 2016
    Date of Patent: November 30, 2021
    Assignee: Qualcomm Incorporated
    Inventor: Gregory Michael Wright
  • Patent number: 11070211
    Abstract: An event counter circuit can be configured to monitor operation of a system where a moving average register circuit can be configured to store a moving average value updated in each cycle of operation of the system by adding a number of system events occurring during a current cycle of the system operation to either 1) a current moving average value stored in the moving average register circuit or 2) a keep value generated by partitioning the current moving average value into the keep value and a transfer value representing system events not included in a determination of the moving average value for subsequent cycles of operation of the system.
    Type: Grant
    Filed: May 29, 2020
    Date of Patent: July 20, 2021
    Assignee: Syzexion, Inc.
    Inventor: Gregory Michael Wright
  • Patent number: 11048509
    Abstract: Providing multi-element multi-vector (MEMV) register file access in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device includes a vector processor comprising multiple processing elements (PEs) communicatively coupled via a corresponding plurality of channels to a vector register file comprising a plurality of memory banks. The vector processor provides a direct memory access (DMA) controller that is configured to receive a plurality of vectors that each comprise a plurality of vector elements representing operands for processing a loop iteration. The DMA controller arranges the vectors in the vector register file such that, for each group of vectors to be accessed in parallel, vector elements for each vector are stored consecutively, but corresponding vector elements of consecutive vectors are stored in different memory banks of the vector register file.
    Type: Grant
    Filed: June 5, 2018
    Date of Patent: June 29, 2021
    Inventors: Hadi Parandeh Afshar, Amrit Panda, Eric Rotenberg, Gregory Michael Wright
  • Patent number: 10929139
    Abstract: Providing predictive instruction dispatch throttling to prevent resource overflow in out-of-order processor (OOP)-based devices is disclosed. An OOP-based device includes a system resource that may be consumed or otherwise occupied by instructions, as well as an execution pipeline comprising a decode stage and a dispatch stage. The OOP further maintains a running count and a resource usage threshold. Upon receiving an instruction block, the decode stage extracts a proxy value that indicates an approximate predicted count of instructions within the instruction block that will consume a system resource. The decode stage then increments the running count by the proxy value. The dispatch stage compares the running count to the resource usage threshold before dispatching any younger instruction blocks. If the running count exceeds the resource usage threshold, the dispatch stage blocks dispatching of younger instruction blocks until the running count no longer exceeds the resource usage threshold.
    Type: Grant
    Filed: September 27, 2018
    Date of Patent: February 23, 2021
    Assignee: Qualcomm Incorporated
    Inventors: Lisa Ru-feng Hsu, Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
  • Publication number: 20200382123
    Abstract: An event counter circuit can be configured to monitor operation of a system where a moving average register circuit can be configured to store a moving average value updated in each cycle of operation of the system by adding a number of system events occurring during a current cycle of the system operation to either 1) a current moving average value stored in the moving average register circuit or 2) a keep value generated by partitioning the current moving average value into the keep value and a transfer value representing system events not included in a determination of the moving average value for subsequent cycles of operation of the system
    Type: Application
    Filed: May 29, 2020
    Publication date: December 3, 2020
    Inventor: Gregory Michael Wright
  • Patent number: 10846260
    Abstract: Providing reconfigurable fusion of processing elements (PEs) in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device provides a vector processor including a plurality of PEs and a decode/control circuit. The decode/control circuit receives an instruction block containing a vectorizable loop comprising a loop body. The decode/control circuit determines how many PEs of the plurality of PEs are required to execute the loop body, and reconfigures the plurality of PEs into one or more fused PEs, each including the determined number of PEs required to execute the loop body. The plurality of PEs, reconfigured into one or more fused PEs, then executes one or more loop iterations of the loop body. Some aspects further include a PE communications link interconnecting the plurality of PEs, to enable communications between PEs of a fused PE and communications of inter-iteration data dependencies between PEs without requiring vector register file access operations.
    Type: Grant
    Filed: July 5, 2018
    Date of Patent: November 24, 2020
    Assignee: Qualcomm Incorporated
    Inventors: Hadi Parandeh Afshar, Amrit Panda, Eric Rotenberg, Gregory Michael Wright
  • Patent number: 10783011
    Abstract: Systems and methods are directed to efficient management of processor resources, particularly General Purpose Registers (GPRs), for example to minimize pipeline flushes prevent deadlocks by counting GPRs instead of allocating them to specific blocks of code. Blocks of code are allowed to execute if the Free GPRs count is adequate. The method contemplates counting the number of Register Writers in blocks of code which will write to GPRs which are in process of executing, and counting the GPRs which are available instead of merely allocating them to dedicated use by a block of code, or an instruction in a block of code. Because blocks do not run if there is not enough GPRs available for the block, deadlocks and pipeline flushes due to lack of resources can be minimized.
    Type: Grant
    Filed: September 21, 2017
    Date of Patent: September 22, 2020
    Assignee: Qualcomm Incorporated
    Inventors: Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
  • Patent number: 10725782
    Abstract: Providing variable interpretation of usefulness indicators for memory tables in processor-based systems is disclosed. In one aspect, a memory system comprises a memory table providing multiple memory table entries, each including a usefulness indicator. A memory controller of the memory system comprises a global polarity indicator representing how the usefulness indicator for each memory table entry is interpreted and updated by the memory controller. If the global polarity indicator is set, the memory controller interprets a value of each usefulness indicator as directly corresponding to the usefulness of the corresponding memory table entry. Conversely, if the global polarity indicator is not set, the polarity is reversed such that the memory controller interprets the usefulness indicator value as inversely corresponding to the usefulness of the corresponding memory table entry.
    Type: Grant
    Filed: September 12, 2017
    Date of Patent: July 28, 2020
    Assignee: Qualcomm Incorporated
    Inventors: Anil Krishna, Yongseok Yi, Eric Rotenberg, Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
  • Patent number: 10684859
    Abstract: Providing memory dependence prediction in block-atomic dataflow architectures is provided, in one aspect, la a memory dependence prediction circuit. The memory dependence prediction circuit comprises a predictor table configured to store multiple predictor table entries, each comprising a store instruction identifier, a block reach set, and a load set. Using this data, the memory dependence prediction circuit determines, upon a fetch of an instruction block by an execution pipeline, whether the instruction block contains store instructions that reach dependent load instructions. If so, the store instructions are marked as having dependent load instructions to wake. In some aspects, the memory dependence prediction circuit is configured to determine whether the instruction block contains dependent load instructions reached by store instructions. If so, the memory dependence prediction circuit delays execution of the dependent load instructions.
    Type: Grant
    Filed: September 19, 2016
    Date of Patent: June 16, 2020
    Assignee: QUALCOMM Incorporated
    Inventors: Chen-Han Ho, Gregory Michael Wright
  • Patent number: 10628162
    Abstract: Enabling parallel memory accesses by providing explicit affine instructions in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device implementing a block-based dataflow instruction set architecture (ISA) includes a decoder circuit configured to provide an affine instruction that specifies a base parameter indicating a base value B, a stride parameter indicating a stride interval value S, and a count parameter indicating a count value C. The decoder circuit of the vector-processor-based device decodes the affine instruction, and generates an output stream comprising one or more output values, wherein a count of the output values of the output stream equals the count value C. Using an index X where 0?X<C, each Xth output value in the output stream is generated as a sum of the base value B and a product of the stride interval value S and the index X.
    Type: Grant
    Filed: June 19, 2018
    Date of Patent: April 21, 2020
    Assignee: Qualcomm Incorporated
    Inventors: Amrit Panda, Eric Rotenberg, Hadi Parandeh Afshar, Gregory Michael Wright
  • Publication number: 20200104163
    Abstract: Providing predictive instruction dispatch throttling to prevent resource overflow in out-of-order processor (OOP)-based devices is disclosed. In this regard, an OOP-based device includes a system resource that may be consumed or otherwise occupied by instructions, as well as an execution pipeline comprising a decode stage and a dispatch stage. The OOP further maintains a running count and a resource usage threshold. Upon receiving an instruction block, the decode stage extracts a proxy value that indicates an approximate predicted count of instructions within the instruction block that will consume a system resource. The decode stage then increments the running count by the proxy value. The dispatch stage compares the running count to the resource usage threshold before dispatching any younger instruction blocks. If the running count exceeds the resource usage threshold, the dispatch stage blocks dispatching of younger instruction blocks until the running count no longer exceeds the resource usage threshold.
    Type: Application
    Filed: September 27, 2018
    Publication date: April 2, 2020
    Inventors: Lisa Ru-feng Hsu, Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
  • Publication number: 20200065098
    Abstract: Providing efficient handling of branch divergence in vectorizable loops by vector-processor-based devices is disclosed. In some aspects, a vector-processor-based device provides a plurality of processing elements (PEs) coupled to a scheduler circuit comprising a clock cycle threshold and a mask register comprising a plurality of bits corresponding to a plurality of loop iterations of a vectorizable loop to be executed. The scheduler circuit initiates a first execution interval, during which loop iterations of the vectorizable loop are assigned to PEs for parallel execution. If a loop iteration's execution time exceeds the clock cycle threshold, the scheduler circuit sets a mask register bit corresponding to the loop iteration indicating that the loop iteration is incomplete, and defers its execution.
    Type: Application
    Filed: August 21, 2018
    Publication date: February 27, 2020
    Inventors: Hadi Parandeh Afshar, Eric Rotenberg, Gregory Michael Wright
  • Publication number: 20200012618
    Abstract: Providing reconfigurable fusion of processing elements (PEs) in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device provides a vector processor including a plurality of PEs and a decode/control circuit. The decode/control circuit receives an instruction block containing a vectorizable loop comprising a loop body. The decode/control circuit determines how many PEs of the plurality of PEs are required to execute the loop body, and reconfigures the plurality of PEs into one or more fused PEs, each including the determined number of PEs required to execute the loop body. The plurality of PEs, reconfigured into one or more fused PEs, then executes one or more loop iterations of the loop body. Some aspects further include a PE communications link interconnecting the plurality of PEs, to enable communications between PEs of a fused PE and communications of inter-iteration data dependencies between PEs without requiring vector register file access operations.
    Type: Application
    Filed: July 5, 2018
    Publication date: January 9, 2020
    Inventors: Hadi Parandeh Afshar, Amrit Panda, Eric Rotenberg, Gregory Michael Wright
  • Publication number: 20190384606
    Abstract: Enabling parallel memory accesses by providing explicit affine instructions in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device implementing a block-based dataflow instruction set architecture (ISA) includes a decoder circuit configured to provide an affine instruction that specifies a base parameter indicating a base value B, a stride parameter indicating a stride interval value S, and a count parameter indicating a count value C. The decoder circuit of the vector-processor-based device decodes the affine instruction, and generates an output stream comprising one or more output values, wherein a count of the output values of the output stream equals the count value C. Using an index X where 0?X<C, each Xth output value in the output stream is generated as a sum of the base value B and a product of the stride interval value S and the index X.
    Type: Application
    Filed: June 19, 2018
    Publication date: December 19, 2019
    Inventors: Amrit Panda, Eric Rotenberg, Hadi Parandeh Afshar, Gregory Michael Wright
  • Publication number: 20190369994
    Abstract: Providing multi-element multi-vector (MEMV) register file access in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device includes a vector processor comprising multiple processing elements (PEs) communicatively coupled via a corresponding plurality of channels to a vector register file comprising a plurality of memory banks. The vector processor provides a direct memory access (DMA) controller that is configured to receive a plurality of vectors that each comprise a plurality of vector elements representing operands for processing a loop iteration. The DMA controller arranges the vectors in the vector register file such that, for each group of vectors to be accessed in parallel, vector elements for each vector are stored consecutively, but corresponding vector elements of consecutive vectors are stored in different memory banks of the vector register file.
    Type: Application
    Filed: June 5, 2018
    Publication date: December 5, 2019
    Inventors: Hadi Parandeh Afshar, Amrit Panda, Eric Rotenberg, Gregory Michael Wright
  • Publication number: 20190087241
    Abstract: Systems and methods are directed to efficient management of processor resources, particularly General Purpose Registers (GPRs), for example to minimize pipeline flushes prevent deadlocks by counting GPRs instead of allocating them to specific blocks of code. Blocks of code are allowed to execute if the Free GPRs count is adequate. The method contemplates counting the number of Register Writers in blocks of code which will write to GPRs which are in process of executing, and counting the GPRs which are available instead of merely allocating them to dedicated use by a block of code, or an instruction in a block of code. Because blocks do not run if there is not enough GPRs available for the block, deadlocks and pipeline flushes due to lack of resources can be minimized.
    Type: Application
    Filed: September 21, 2017
    Publication date: March 21, 2019
    Inventors: Vignyan Reddy KOTHINTI NARESH, Gregory Michael WRIGHT
  • Publication number: 20190079772
    Abstract: Providing variable interpretation of usefulness indicators for memory tables in processor-based systems is disclosed. In one aspect, a memory system comprises a memory table providing multiple memory table entries, each including a usefulness indicator. A memory controller of the memory system comprises a global polarity indicator representing how the usefulness indicator for each memory table entry is interpreted and updated by the memory controller. If the global polarity indicator is set, the memory controller interprets a value of each usefulness indicator as directly corresponding to the usefulness of the corresponding memory table entry. Conversely, if the global polarity indicator is not set, the polarity is reversed such that the memory controller interprets the usefulness indicator value as inversely corresponding to the usefulness of the corresponding memory table entry.
    Type: Application
    Filed: September 12, 2017
    Publication date: March 14, 2019
    Inventors: Anil Krishna, Yongseok Yi, Eric Rotenberg, Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
  • Publication number: 20190065060
    Abstract: Caching instruction block header data in block architecture processor-based systems is disclosed. In one aspect, a computer processor device, based on a block architecture, provides an instruction block header cache dedicated to caching instruction block header data. Upon a subsequent fetch of an instruction block, cached instruction block header data may be retrieved from the instruction block header cache (if present) and used to optimize processing of the instruction block. In some aspects, the instruction block header data may include a microarchitectural block header (MBH) generated upon the first decoding of the instruction block by an MBH generation circuit. The MBH may contain static or dynamic information about the instructions within the instruction block. As non-limiting examples, the information may include data relating to register reads and writes, load and store operations, branch information, predicate information, special instructions, and/or serial execution preferences.
    Type: Application
    Filed: August 28, 2017
    Publication date: February 28, 2019
    Inventors: Anil Krishna, Gregory Michael Wright, Yongseok Yi, Matthew Gilbert, Vignyan Reddy Kothinti Naresh
  • Publication number: 20190013062
    Abstract: Systems and methods for selective refresh of a cache, such as a last-level cache implemented as an embedded DRAM (eDRAM). A refresh bit and a reuse bit are associated with each way of at least one set of the cache. A least recently used (LRU) stack tracks positions of the ways, with positions towards a most recently used position of a threshold comprising more recently used positions and positions towards a least recently used position of the threshold comprise less recently used positions. A line in a way is selectively refreshed if the position of the way is one of the more recently used positions and if the refresh bit associated with the way is set, or the position of the way is one of the less recently used positions and if the refresh bit and the reuse bit associated with the way are both set.
    Type: Application
    Filed: July 7, 2017
    Publication date: January 10, 2019
    Inventors: Francois Ibrahim ATALLAH, Gregory Michael WRIGHT, Shivam PRIYADARSHI, Garrett Michael DRAPALA, Harold Wade CAIN, III, Erik HEDBERG