Patents by Inventor Brian Michael Stempel

Brian Michael Stempel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11550723
    Abstract: An apparatus, method, and system for memory bandwidth aware data prefetching is presented. The method may comprise monitoring a number of request responses received in an interval at a current prefetch request generation rate, comparing the number of request responses received in the interval to at least a first threshold, and adjusting the current prefetch request generation rate to an updated prefetch request generation rate by selecting the updated prefetch request generation rate from a plurality of prefetch request generation rates, based on the comparison. The request responses may be NACK or RETRY responses. The method may further comprise either retaining a current prefetch request generation rate or selecting a maximum prefetch request generation rate as the updated prefetch request generation rate in response to an indication that prefetching is accurate.
    Type: Grant
    Filed: August 27, 2018
    Date of Patent: January 10, 2023
    Assignee: Qualcomm Incorporated
    Inventors: Niket Choudhary, David Scott Ray, Thomas Philip Speier, Eric Robinson, Harold Wade Cain, III, Nikhil Narendradev Sharma, Joseph Gerald McDonald, Brian Michael Stempel, Garrett Michael Drapala
  • Patent number: 10956162
    Abstract: Operand-based reach explicit dataflow processors, and related methods and computer-readable media are disclosed. The operand-based reach explicit dataflow processors support execution of a producer instruction that explicitly names a target consumer operand of a consumer instruction in a consumer operand encoding namespace of the producer instruction. The produced value from execution of the producer instruction is provided or otherwise made available as an input to the named target consumer operand of the consumer instruction as a result of processing the producer instruction. The target consumer operand is encoded in the producer instruction as an operand target distance relative to the producer instruction. Instructions in an instruction stream between the producer instruction and the targeted consumer instruction that have no operands do not consume an operand reach namespace in the producer instructions.
    Type: Grant
    Filed: June 28, 2019
    Date of Patent: March 23, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Robert Douglas Clancy, Melinda Joyce Brown, Yusuf Cagatay Tekmen, Brian Michael Stempel, Michael Scott Mcilvaine, Thomas Philip Speier, Rodney Wayne Smith, Gagan Gupta, David Tennyson Harper, III
  • Publication number: 20200409712
    Abstract: Operand-based reach explicit dataflow processors, and related methods and computer-readable media are disclosed. The operand-based reach explicit dataflow processors support execution of a producer instruction that explicitly names a target consumer operand of a consumer instruction in a consumer operand encoding namespace of the producer instruction. The produced value from execution of the producer instruction is provided or otherwise made available as an input to the named target consumer operand of the consumer instruction as a result of processing the producer instruction. The target consumer operand is encoded in the producer instruction as an operand target distance relative to the producer instruction. Instructions in an instruction stream between the producer instruction and the targeted consumer instruction that have no operands do not consume an operand reach namespace in the producer instructions.
    Type: Application
    Filed: June 28, 2019
    Publication date: December 31, 2020
    Inventors: Robert Douglas CLANCY, Melinda Joyce BROWN, Yusuf Cagatay TEKMEN, Brian Michael STEMPEL, Michael Scott MCILVAINE, Thomas Philip SPEIER, Rodney Wayne SMITH, Gagan GUPTA, David Tennyson HARPER, III
  • Patent number: 10877895
    Abstract: A method, apparatus, and system for prefetching exclusive cache coherence state for store instructions is disclosed. An apparatus may comprise a cache and a gather buffer coupled to the cache. The gather buffer may be configured to store a plurality of cache lines, each cache line of the plurality of cache lines associated with a store instruction. The gather buffer may be further configured to determine whether a first cache line associated with a first store instruction should be allocated in the cache. If the first cache line associated with the first store instruction is to be allocated in the cache, the gather buffer is configured to issue a pre-write request to acquire exclusive cache coherency state to the first cache line associated with the first store instruction.
    Type: Grant
    Filed: August 27, 2018
    Date of Patent: December 29, 2020
    Assignee: Qualcomm Incorporated
    Inventors: Luke Yen, Niket Choudhary, Pritha Ghoshal, Thomas Philip Speier, Brian Michael Stempel, William James McAvoy, Patrick Eibl
  • Publication number: 20200065247
    Abstract: An apparatus, method, and system for memory bandwidth aware data prefetching is presented. The method may comprise monitoring a number of request responses received in an interval at a current prefetch request generation rate, comparing the number of request responses received in the interval to at least a first threshold, and adjusting the current prefetch request generation rate to an updated prefetch request generation rate by selecting the updated prefetch request generation rate from a plurality of prefetch request generation rates, based on the comparison. The request responses may be NACK or RETRY responses. The method may further comprise either retaining a current prefetch request generation rate or selecting a maximum prefetch request generation rate as the updated prefetch request generation rate in response to an indication that prefetching is accurate.
    Type: Application
    Filed: August 27, 2018
    Publication date: February 27, 2020
    Inventors: Niket CHOUDHARY, David Scott RAY, Thomas Philip SPEIER, Eric ROBINSON, Harold Wade CAIN, III, Nikhil Narendradev SHARMA, Joseph Gerald MCDONALD, Brian Michael STEMPEL, Garrett Michael DRAPALA
  • Publication number: 20200065006
    Abstract: A method, apparatus, and system for prefetching exclusive cache coherence state for store instructions is disclosed. An apparatus may comprise a cache and a gather buffer coupled to the cache. The gather buffer may be configured to store a plurality of cache lines, each cache line of the plurality of cache lines associated with a store instruction. The gather buffer may be further configured to determine whether a first cache line associated with a first store instruction should be allocated in the cache. If the first cache line associated with the first store instruction is to be allocated in the cache, the gather buffer is configured to issue a pre-write request to acquire exclusive cache coherency state to the first cache line associated with the first store instruction.
    Type: Application
    Filed: August 27, 2018
    Publication date: February 27, 2020
    Inventors: Luke YEN, Niket CHOUDHARY, Pritha GHOSHAL, Thomas Philip SPEIER, Brian Michael STEMPEL, William James MCAVOY, Patrick EIBL
  • Patent number: 10108419
    Abstract: Systems and methods for dependency-prediction include executing instructions in an instruction pipeline of a processor and detecting a conditionality-imposing control instruction, such as an If-Then (IT) instruction, which imposes dependent behavior on a conditionality block size of one or more dependent instructions. Prior to executing a first instruction, a dependency-prediction is made to determine if the first instruction is a dependent instruction of the conditionality-imposing control instruction, based on the conditionality block size and one or more parameters of the instruction pipeline. The first instruction is executed based on the dependency-prediction. When the first instruction is dependency-mispredicted, an associated dependency-misprediction penalty is mitigated. If the first instruction is a branch instruction, the mitigation involves training a branch prediction tracking mechanism to correctly dependency-predict future occurrences of the first instruction.
    Type: Grant
    Filed: September 26, 2014
    Date of Patent: October 23, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Brian Michael Stempel, James Norris Dieffenderfer, Michael Scott McIlvaine, Melinda Joyce Brown
  • Publication number: 20180285269
    Abstract: Aggregating cache maintenance instructions in processor-based devices is disclosed. In this regard, a processor-based device comprises one or more processing elements (PEs), each providing an aggregation circuit configured to detect a first cache maintenance instruction in an instruction stream. The aggregation circuit then aggregates one or more subsequent, consecutive cache maintenance instructions in the instruction stream with the first cache maintenance instruction until an end condition is detected (e.g., detection of a data synchronization barrier instruction or a cache maintenance instruction targeting a non-consecutive memory address or a different memory page than a previous cache maintenance instruction, and/or detection that an aggregation limit has been exceeded). After detecting the end condition, the aggregation circuit generates a single cache maintenance request representing the aggregated cache maintenance instructions.
    Type: Application
    Filed: April 2, 2018
    Publication date: October 4, 2018
    Inventors: William James McAvoy, Thomas Philip Speier, Brian Michael Stempel
  • Publication number: 20180089094
    Abstract: Systems and methods for precise invalidation of cache lines of a virtually indexed virtually tagged (VIVT) cache include associating, with each cache line of the VIVT cache, at least a translation lookaside buffer (TLB) index corresponding to a TLB entry which comprises a virtual address to physical address translation for the cache line. The TLB entries are inclusive of the cache lines of the VIVT cache. Upon receiving an invalidate instruction, the invalidate instruction is filtered at the TLB to determine if the invalidate instruction might affect cache lines in the VIVT cache. If the invalidate instruction might affect cache lines in the VIVT cache, the TLB indices of the TLB entries which match the invalidate instruction are determined, and only the cache lines of the VIVT cache which are associated with the affected TLB indices are selectively invalidated.
    Type: Application
    Filed: September 23, 2016
    Publication date: March 29, 2018
    Inventors: Robert Douglas CLANCY, Gaurav MEHTA, Spencer Ellis WILLIAMS, Brian Michael STEMPEL, Thomas Philip SPEIER, Michael Scott MCILVAINE, William James MCAVOY
  • Patent number: 9823929
    Abstract: A processor includes a queue for storing instructions processed within the context of a current value of a register field, where for some embodiments the instruction is undefined or defined, depending upon the register field at time of processing. After a write instruction (an instruction that writes to the register field) executes, the queue is searched for any entries that contain instructions that depend upon the executed write instruction. Each such entry stores the value of the register field at the time the instruction in the entry was processed. If such an entry is found in the queue and its stored value of the register field does not match the value that the write instruction wrote to the register field, then the processor flushes the pipeline and restarts at a state so as to correctly execute the instruction.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: November 21, 2017
    Assignee: QUALCOMM Incorporated
    Inventors: Daren Eugene Streett, Brian Michael Stempel, Thomas Philip Speier, Rodney Wayne Smith, Michael Scott McIlvaine, Kenneth Alan Dockser, James Norris Dieffenderfer
  • Patent number: 9477476
    Abstract: Fusing immediate value, write-based instructions in instruction processing circuits, and related processor systems, methods, and computer-readable media are disclosed. In one embodiment, a first instruction indicating an operation writing an immediate value to a register is detected by an instruction processing circuit. The circuit also detects at least one subsequent instruction indicating an operation that overwrites at least one first portion of the register while maintaining a value of a second portion of the register. The at least one subsequent instruction is converted (or replaced) with a fused instruction(s), which indicates an operation writing the at least one first portion and the second portion of the register.
    Type: Grant
    Filed: November 27, 2012
    Date of Patent: October 25, 2016
    Assignee: QUALCOMM Incorporated
    Inventors: Melinda J. Brown, Michael William Morrow, James Norris Dieffenderfer, Brian Michael Stempel, Michael Scott McIlvaine, Rodney Wayne Smith, Jeffrey M. Schottmiller, Andrew S. Irwin
  • Patent number: 9477478
    Abstract: The disclosure relates to predicting simple and polymorphic branch instructions. An embodiment of the disclosure detects that a program instruction is a branch instruction, determines whether a program counter for the branch instruction is stored in a program counter filter, and, if the program counter is stored in the program counter filter, prevents the program counter from being stored in a first level predictor.
    Type: Grant
    Filed: May 16, 2012
    Date of Patent: October 25, 2016
    Assignee: QUALCOMM Incorporated
    Inventors: Kulin N. Kothari, Michael William Morrow, James Norris Dieffenderfer, Michael Scott McIlvaine, Brian Michael Stempel, Daren Eugene Streett
  • Patent number: 9460018
    Abstract: Systems and methods are disclosed for maintaining an instruction cache including extended cache lines and page attributes for main cache line portions of the extended cache lines and, at least for one or more predefined potential page-crossing instruction locations, additional page attributes for extra data portions of the corresponding extended cache lines. In addition, systems and methods are disclosed for processing page-crossing instructions fetched from an instruction cache having extended cache lines.
    Type: Grant
    Filed: June 28, 2012
    Date of Patent: October 4, 2016
    Assignee: QUALCOMM Incorporated
    Inventors: Leslie Mark DeBruyne, James Norris Dieffenderfer, Michael Scott McIlvaine, Brian Michael Stempel
  • Patent number: 9411590
    Abstract: An apparatus and method for executing call branch and return branch instructions in a processor by utilizing a link register stack. The processor includes a branch counter that is initialized to zero, and is set to zero each time the processor decodes a link register manipulating instruction other than a call branch instruction. The branch counter is incremented by one each time a call branch instruction is decoded and an address is pushed onto the link register stack. In response to decoding a return branch instruction and provided the branch counter is not zero, a target address for the decoded return branch instruction is popped off the link register stack, the branch counter is decremented, and there is no need to check the target address for correctness.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: August 9, 2016
    Assignee: QUALCOMM Incorporated
    Inventors: Rodney Wayne Smith, Jeffery M. Schottmiller, Michael Scott McIlvaine, Brian Michael Stempel, Melinda J. Brown, Daren Eugene Streett
  • Patent number: 9329930
    Abstract: Aspects disclosed herein include cache memory error detection circuits for detecting bit flips in valid indicators (e.g., valid bits) in cache memory following invalidate operations. Related methods and processor-based systems are also disclosed. If a cache hit results from access to a cache entry following an invalidate operation, a bit flip(s) has occurred in a valid indicator of the cache entry. This is because the valid indicator should indicate an invalid state following the invalidate operation of the cache entry, as opposed to a valid state. Thus, a cache memory error detection circuit is configured to determine if an invalidate operation was performed on the cache entry. The cache memory error detection circuit can cause a cache miss or an error for the accessed cache entry to be generated as a result, even though the valid indicator for the cache entry indicates a valid state due to the bit flip(s).
    Type: Grant
    Filed: April 18, 2014
    Date of Patent: May 3, 2016
    Assignee: QUALCOMM Incorporated
    Inventors: John Sumner Ingalls, Brian Michael Stempel, Thomas Philip Speier
  • Patent number: 9317293
    Abstract: Establishing a branch target instruction cache (BTIC) entry for subroutine returns to reduce pipeline bubbles, and related systems, methods, and computer-readable media are disclosed. In one embodiment, a method of establishing a BTIC entry includes detecting a subroutine call in an execution pipeline. In response, at least one instruction fetched sequential to the subroutine call is written as a branch target instruction in a BTIC entry for a subroutine return. A next instruction fetch address is calculated, and is written into a next instruction fetch address field in the BTIC entry. In this manner, the BTIC may provide correct branch target instruction and next instruction fetch address data for the subroutine return, even if the subroutine return is encountered for the first time or the subroutine is called from different calling locations.
    Type: Grant
    Filed: March 11, 2013
    Date of Patent: April 19, 2016
    Assignee: QUALCOMM Incorporated
    Inventors: James Norris Dieffenderfer, Michael William Morrow, Michael Scott McIlvaine, Daren Eugene Streett, Vimal K. Reddy, Brian Michael Stempel
  • Publication number: 20160092221
    Abstract: Systems and methods for dependency-prediction include executing instructions in an instruction pipeline of a processor and detecting a conditionality-imposing control instruction, such as an If-Then (IT) instruction, which imposes dependent behavior on a conditionality block size of one or more dependent instructions. Prior to executing a first instruction, a dependency-prediction is made to determine if the first instruction is a dependent instruction of the conditionality-imposing control instruction, based on the conditionality block size and one or more parameters of the instruction pipeline. The first instruction is executed based on the dependency-prediction. When the first instruction is dependency-mispredicted, an associated dependency-misprediction penalty is mitigated. If the first instruction is a branch instruction, the mitigation involves training a branch prediction tracking mechanism to correctly dependency-predict future occurrences of the first instruction.
    Type: Application
    Filed: September 26, 2014
    Publication date: March 31, 2016
    Inventors: Brian Michael STEMPEL, James Norris DIEFFENDERFER, Michael Scott MCILVAINE, Melinda BROWN
  • Patent number: 9195466
    Abstract: Fusing conditional write instructions having opposite conditions in instruction processing circuits and related processor systems, methods, and computer-readable media are disclosed. In one embodiment, a first conditional write instruction writing a first value to a target register based on evaluating a first condition is detected by an instruction processing circuit. The circuit also detects a second conditional write instruction writing a second value to the target register based on evaluating a second condition that is a logical opposite of the first condition. Either the first condition or the second condition is selected as a fused instruction condition, and corresponding values are selected as if-true and if-false values. A fused instruction is generated for selectively writing the if-true value to the target register if the fused instruction condition evaluates to true, and selectively writing the if-false value to the target register if the fused instruction condition evaluates to false.
    Type: Grant
    Filed: November 14, 2012
    Date of Patent: November 24, 2015
    Assignee: QUALCOMM Incorporated
    Inventors: Melinda J. Brown, James Norris Dieffenderfer, Michael Scott McIlvaine, Brian Michael Stempel, Rodney Wayne Smith, Jeffery M. Schottmiller, Andrew S. Irwin, Michael William Morrow
  • Publication number: 20150301884
    Abstract: Aspects disclosed herein include cache memory error detection circuits for detecting bit flips in valid indicators (e.g., valid bits) in cache memory following invalidate operations. Related methods and processor-based systems are also disclosed. If a cache hit results from access to a cache entry following an invalidate operation, a bit flip(s) has occurred in a valid indicator of the cache entry. This is because the valid indicator should indicate an invalid state following the invalidate operation of the cache entry, as opposed to a valid state. Thus, a cache memory error detection circuit is configured to determine if an invalidate operation was performed on the cache entry. The cache memory error detection circuit can cause a cache miss or an error for the accessed cache entry to be generated as a result, even though the valid indicator for the cache entry indicates a valid state due to the bit flip(s).
    Type: Application
    Filed: April 18, 2014
    Publication date: October 22, 2015
    Applicant: QUALCOMM Incorporated
    Inventors: John Sumner Ingalls, Brian Michael Stempel, Thomas Philip Speier
  • Patent number: 9146741
    Abstract: Eliminating redundant masking operations in instruction processing circuits and related processor systems, methods, and computer-readable media are disclosed. In one embodiment, a first instruction in an instruction stream indicating an operation writing a value to a first register is detected by an instruction processing circuit, the value having a value size less than a size of the first register. The circuit also detects a second instruction in the instruction stream indicating a masking operation on the first register. The masking operation is eliminated upon a determination that the masking operation indicates a read operation and a write operation on the first register and has an identity mask size equal to or greater than the value size. In this manner, the elimination of the masking operation avoids potential read-after-write hazards and improves performance of a CPU by removing redundant operations from an execution pipeline.
    Type: Grant
    Filed: October 19, 2012
    Date of Patent: September 29, 2015
    Assignee: QUALCOMM Incorporated
    Inventors: Melinda J. Brown, Michael William Morrow, James Norris Dieffenderfer, Brian Michael Stempel, Michael Scott McIlvaine