Patents by Inventor Brian Stempel

Brian Stempel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11593117
    Abstract: Various aspects disclosed herein relate to combining instructions to load data from or store data in memory while processing instructions in a computer processor. More particularly, at least one pattern of multiple memory access instructions that reference a common base register and do not fully utilize an available bus width may be identified in a processor pipeline. In response to determining that the multiple memory access instructions target adjacent memory or non-contiguous memory that can fit on a single cache line, the multiple memory access instructions may be replaced within the processor pipeline with one equivalent memory access instruction that utilizes more of the available bus width than either of the replaced memory access instructions.
    Type: Grant
    Filed: June 29, 2018
    Date of Patent: February 28, 2023
    Assignee: Qualcomm Incorporated
    Inventors: Harsh Thakker, Thomas Philip Speier, Rodney Wayne Smith, Kevin Jaget, James Norris Dieffenderfer, Michael Morrow, Pritha Ghoshal, Yusuf Cagatay Tekmen, Brian Stempel, Sang Hoon Lee, Manish Garg
  • Patent number: 11061822
    Abstract: A method, apparatus, and system for reducing pipeline stalls due to address translation misses is presented. An apparatus comprises a memory access instruction pipeline, a translation lookaside buffer coupled to the memory access instruction pipeline, and a TLB miss queue coupled to both the TLB and the memory access instruction pipeline. The TLB miss queue is configured to selectively store a first memory access instruction that has been removed from the memory access instruction pipeline as a result of the first memory access instruction missing in the TLB along with information associated with the first memory access instruction. The TLB miss queue is further configured to reintroduce the first memory access instruction to the memory access instruction pipeline associated with a return of an address translation related to the first memory access instruction.
    Type: Grant
    Filed: August 27, 2018
    Date of Patent: July 13, 2021
    Assignee: Qualcomm Incorporated
    Inventors: Pritha Ghoshal, Niket Choudhary, Ravi Rajagopalan, Patrick Eibl, Brian Stempel, David Scott Ray, Thomas Philip Speier
  • Publication number: 20200065260
    Abstract: A method, apparatus, and system for reducing pipeline stalls due to address translation misses is presented. An apparatus comprises a memory access instruction pipeline, a translation lookaside buffer coupled to the memory access instruction pipeline, and a TLB miss queue coupled to both the TLB and the memory access instruction pipeline. The TLB miss queue is configured to selectively store a first memory access instruction that has been removed from the memory access instruction pipeline as a result of the first memory access instruction missing in the TLB along with information associated with the first memory access instruction. The TLB miss queue is further configured to reintroduce the first memory access instruction to the memory access instruction pipeline associated with a return of an address translation related to the first memory access instruction.
    Type: Application
    Filed: August 27, 2018
    Publication date: February 27, 2020
    Inventors: Pritha GHOSHAL, Niket CHOUDHARY, Ravi RAJAGOPALAN, Patrick EIBL, Brian STEMPEL, David Scott Ray, Thomas Philip SPEIER
  • Publication number: 20200004550
    Abstract: Various aspects disclosed herein relate to combining instructions to load data from or store data in memory while processing instructions in a computer processor. More particularly, at least one pattern of multiple memory access instructions that reference a common base register and do not fully utilize an available bus width may be identified in a processor pipeline. In response to determining that the multiple memory access instructions target adjacent memory or non-contiguous memory that can fit on a single cache line, the multiple memory access instructions may be replaced within the processor pipeline with one equivalent memory access instruction that utilizes more of the available bus width than either of the replaced memory access instructions.
    Type: Application
    Filed: June 29, 2018
    Publication date: January 2, 2020
    Inventors: Harsh THAKKER, Thomas Philip SPEIER, Rodney Wayne SMITH, Kevin JAGET, James Norris DIEFFENDERFER, Michael MORROW, Pritha GHOSHAL, Yusuf Cagatay TEKMEN, Brian STEMPEL, Sang Hoon LEE, Manish GARG
  • Patent number: 10318436
    Abstract: A translation lookaside buffer (TLB) index valid bit is set in a first line of a virtually indexed, virtually tagged (VIVT) cache. The first line of the VIVT cache is associated with a first TLB entry which stores a virtual address to physical address translation for the first cache line. The TLB index valid bit of the first line is cleared upon determining that the translation is no longer stored in the first TLB entry. An indication of a received invalidation instruction is stored. When a context synchronization instruction is received, the first line of the VIVT cache is cleared based on the TLB index valid bit being cleared and the stored indication of the invalidate instruction.
    Type: Grant
    Filed: July 25, 2017
    Date of Patent: June 11, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: William McAvoy, Brian Stempel, Spencer Williams, Robert Douglas Clancy, Michael Scott McIlvaine, Thomas Philip Speier
  • Publication number: 20190034349
    Abstract: A translation lookaside buffer (TLB) index valid bit is set in a first line of a virtually indexed, virtually tagged (VIVT) cache. The first line of the VIVT cache is associated with a first TLB entry which stores a virtual address to physical address translation for the first cache line. The TLB index valid bit of the first line is cleared upon determining that the translation is no longer stored in the first TLB entry. An indication of a received invalidation instruction is stored. When a context synchronization instruction is received, the first line of the VIVT cache is cleared based on the TLB index valid bit being cleared and the stored indication of the invalidate instruction.
    Type: Application
    Filed: July 25, 2017
    Publication date: January 31, 2019
    Inventors: William MCAVOY, Brian STEMPEL, Spencer WILLIAMS, Robert Douglas CLANCY, Michael Scott MCILVAINE, Thomas Philip SPEIER
  • Publication number: 20070260854
    Abstract: A pre-decoder in a variable instruction length processor indicates properties of instructions in pre-decode bits stored in an instruction cache with the instructions. When all the encodings of pre-decode bits associate with one length instruction are defined, a property of an instruction of that length may be indicated by altering the instruction to emulate an instruction of a different length, and encoding the property in the pre-decode bits associated with instructions of the different length. One example of a property that may be so indicated is an undefined instruction.
    Type: Application
    Filed: May 4, 2006
    Publication date: November 8, 2007
    Inventors: Rodney Smith, Brian Stempel
  • Publication number: 20070204142
    Abstract: A link stack in a processor is repaired in response to a procedure return address misprediction error. In one example, a link stack for use in a processor is repaired by detecting an error in a procedure return address value retrieved from the link stack and skipping a procedure return address value currently queued for retrieval from the link stack responsive to detecting the error. In one or more embodiments, a link stack circuit comprises a link stack and a link stack pointer. The link stack is configured to store a plurality of procedure return address values. The link stack pointer is configured to skip a procedure return address value currently queued for retrieval from the link stack responsive to an error detected in a procedure return address value previously retrieved from the link stack.
    Type: Application
    Filed: February 27, 2006
    Publication date: August 30, 2007
    Inventors: James Dieffenderfer, David Mandzak, Rodney Smith, Brian Stempel
  • Publication number: 20070094475
    Abstract: A processing system may include a memory configured to store data in a plurality of pages, a TLB, and a memory cache including a plurality of cache lines. Each page in the memory may include a plurality of lines of memory. The memory cache may permit, when a virtual address is presented to the cache, a matching cache line to be identified from the plurality of cache lines, the matching cache line having a matching address that matches the virtual address. The memory cache may be configured to permit one or more page attributes of a page located at the matching address to be retrieved from the memory cache and not from the TLB, by further storing in each one of the cache lines a page attribute of the line of data stored in the cache line.
    Type: Application
    Filed: October 20, 2005
    Publication date: April 26, 2007
    Inventors: Jeffrey Bridges, James Dieffenderfer, Thomas Sartorius, Brian Stempel, Rodney Smith
  • Publication number: 20070067574
    Abstract: A method of managing cache partitions provides a first pointer for higher priority writes and a second pointer for lower priority writes, and uses the first pointer to delimit the lower priority writes. For example, locked writes have greater priority than unlocked writes, and a first pointer may be used for locked writes, and a second pointer may be used for unlocked writes. The first pointer is advanced responsive to making locked writes, and its advancement thus defines a locked region and an unlocked region. The second pointer is advanced responsive to making unlocked writes. The second pointer also is advanced (or retreated) as needed to prevent it from pointing to locations already traversed by the first pointer. Thus, the pointer delimits the unlocked region and allows the locked region to grow at the expense of the unlocked region.
    Type: Application
    Filed: September 21, 2005
    Publication date: March 22, 2007
    Inventors: Brian Stempel, James Dieffenderfer, Jeffrey Bridges, Thomas Sartorius, Rodney Smith, Robert Clancy, Victor Augsburg
  • Publication number: 20060294346
    Abstract: In one or more embodiments, a processor includes a link return stack circuit used for storing branch return addresses, wherein a link return stack controller is configured to determine that one or more entries in the link return stack are invalid as being dependent on a mispredicted branch, and to reset the link return stack to a valid remaining entry, if any. In this manner, branch mispredictions cause dependent entries in the link return stack to be flushed from the link return stack, or otherwise invalidated, while preserving the remaining valid entries, if any, in the link return stack. In at least one embodiment, a branch information queue used for tracking predicted branches is configured to store a marker indicating whether a predicted branch has an associated entry in the link return stack, and it may store an index value identifying the specific, corresponding entry in the link return stack.
    Type: Application
    Filed: June 22, 2005
    Publication date: December 28, 2006
    Inventors: Brian Stempel, James Dieffenderfer, Thomas Sartorius, Rodney Smith
  • Publication number: 20060277397
    Abstract: A microprocessor includes two branch history tables, and is configured to use a first one of the branch history tables for predicting branch instructions that are hits in a branch target cache, and to use a second one of the branch history tables for predicting branch instructions that are misses in the branch target cache. As such, the first branch history table is configured to have an access speed matched to that of the branch target cache, so that its prediction information is timely available relative to branch target cache hit detection, which may happen early in the microprocessor's instruction pipeline. The second branch history table thus need only be as fast as is required for providing timely prediction information in association with recognizing branch target cache misses as branch instructions, such as at the instruction decode stage(s) of the instruction pipeline.
    Type: Application
    Filed: June 2, 2005
    Publication date: December 7, 2006
    Inventors: Thomas Sartorius, Brian Stempel, Jeffrey Bridges, James Dieffenderfer, Rodney Smith
  • Publication number: 20060265572
    Abstract: A fetch section of a processor comprises an instruction cache and a pipeline of several stages for obtaining instructions. Instructions may cross cache line boundaries. The pipeline stages process two addresses to recover a complete boundary crossing instruction. During such processing, if the second piece of the instruction is not in the cache, the fetch with regard to the first line is invalidated and recycled. On this first pass, processing of the address for the second part of the instruction is treated as a pre-fetch request to load instruction data to the cache from higher level memory, without passing any of that data to the later stages of the processor. When the first line address passes through the fetch stages again, the second line address follows in the normal order, and both pieces of the instruction are can be fetched from the cache and combined in the normal manner.
    Type: Application
    Filed: May 18, 2005
    Publication date: November 23, 2006
    Inventors: Brian Stempel, Jeffrey Bridges, Rodney Smith, Thomas Sartorius
  • Publication number: 20060265573
    Abstract: A method and apparatus for caching instructions for a processor having multiple operating states. At least two of the operating states of the processor supporting different instruction sets. A block of instructions may be retrieved from memory while the processor is operating in one of the states. The instructions may be pre-decoded in accordance with said one of the states and loaded into cache. The processor, or another entity, may be used to determine whether the current state of the processor is the same as said one of the states used to pre-decode the instructions when one of the pre-decoded instructions in the cache is needed by the processor.
    Type: Application
    Filed: May 18, 2005
    Publication date: November 23, 2006
    Inventors: Rodney Smith, Brian Stempel
  • Publication number: 20060200686
    Abstract: A processor capable of fetching and executing variable length instructions is described having instructions of at least two lengths. The processor operates in multiple modes. One of the modes restricts instructions that can be fetched and executed to the longer length instructions. An instruction cache is used for storing variable length instructions and their associated predecode bit fields in an instruction cache line and storing the instruction address and processor operating mode state information at the time of the fetch in a tag line. The processor operating mode state information indicates the program specified mode of operation of the processor. The processor fetches instructions from the instruction cache for execution.
    Type: Application
    Filed: March 4, 2005
    Publication date: September 7, 2006
    Inventors: Brian Stempel, James Dieffenderfer, Jeffrey Bridges, Rodney Smith, Thomas Sartorius
  • Publication number: 20060200655
    Abstract: A pipelined processor comprises an instruction cache (iCache), a branch target address cache (BTAC), and processing stages, including a stage to fetch from the iCache and the BTAC. To compensate for the number of cycles needed to fetch a branch target address from the BTAC, the fetch from the BTAC leads the fetch of a branch instruction from the iCache by an amount related to the cycles needed to fetch from the BTAC. Disclosed examples either decrement a write address of the BTAC or increment a fetch address of the BTAC, by an amount essentially corresponding to one less than the cycles needed for a BTAC fetch.
    Type: Application
    Filed: March 4, 2005
    Publication date: September 7, 2006
    Inventors: Rodney Smith, Brian Stempel, James Dieffenderfer, Jeffrey Bridges, Thomas Sartorius
  • Publication number: 20060149981
    Abstract: In a pipelined processor, a pre-decoder in advance of an instruction cache calculates the branch target address (BTA) of PC-relative and absolute address branch instructions. The pre-decoder compares the BTA with the branch instruction address (BIA) to determine whether the target and instruction are in the same memory page. A branch target same page (BTSP) bit indicating this is written to the cache and associated with the instruction. When the branch is executed and evaluated as taken, a TLB access to check permission attributes for the BTA is suppressed if the BTA is in the same page as the BIA, as indicated by the BTSP bit. This reduces power consumption as the TLB access is suppressed and the BTA/BIA comparison is only performed once, when the branch instruction is first fetched. Additionally, the pre-decoder removes the BTA/BIA comparison from the BTA generation and selection critical path.
    Type: Application
    Filed: December 2, 2004
    Publication date: July 6, 2006
    Inventors: James Dieffenderfer, Thomas Sartorius, Rodney Smith, Brian Stempel
  • Publication number: 20060123326
    Abstract: In a pipelined processor where instructions are pre-decoded prior to being stored in a cache, an incorrectly pre-decoded instruction is detected during execution in the pipeline. The corresponding instruction is invalidated in the cache, and the instruction is forced to evaluate as a branch instruction. In particular, the branch instruction is evaluated as “mispredicted not taken” with a branch target address of the incorrectly pre-decoded instruction's address. This, with the invalidated cache line, causes the incorrectly pre-decoded instruction to be re-fetched from memory with a precise address. The re-fetched instruction is then correctly pre-decoded, written to the cache, and executed.
    Type: Application
    Filed: November 22, 2004
    Publication date: June 8, 2006
    Inventors: Rodney Smith, Brian Stempel, James Dieffenderfer, Jeffrey Bridges, Thomas Sartorius
  • Publication number: 20060048011
    Abstract: A method and system for monitoring the real-time of software running on a microprocessor system. Debug hardware is used to select a range of instructions or events to be monitored by a performance monitor interval with the microprocessor system. A comparison is made between each event and start and stop events are identified in the debug hardware. The performance monitor is enabled by the debug hardware, when events occur within the range defined by the debug hardware. Use of the debug hardware for enabling performance monitoring avoids any overhead associated with generating interrupts, or additional code in the application program.
    Type: Application
    Filed: August 26, 2004
    Publication date: March 2, 2006
    Applicant: International Business Machines Corporation
    Inventors: James Dieffenderfer, Sanjay Patel, Brian Stempel
  • Publication number: 20050216703
    Abstract: A method and apparatus for executing instructions in a pipeline processor. The method decreases the latency between an instruction cache and a pipeline processor when bubbles occur in the processing stream due to an execution of a branch correction, or when an interrupt changes the sequence of an instruction stream. The latency is reduced when a decode stage for detecting branch prediction and a related instruction queue location have invalid data representing a bubble in the processing stream. Instructions for execution are inserted in parallel into the decode stage and instruction queue, thereby reducing by one cycle time the length of the pipeline stage.
    Type: Application
    Filed: March 26, 2004
    Publication date: September 29, 2005
    Applicant: International Business Machines Corporation
    Inventors: James Dieffenderfer, Richard Doing, Brian Stempel, Steven Testa, Kenichi Tsuchiya