Patents by Inventor Brian Stempel
Brian Stempel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11593117Abstract: Various aspects disclosed herein relate to combining instructions to load data from or store data in memory while processing instructions in a computer processor. More particularly, at least one pattern of multiple memory access instructions that reference a common base register and do not fully utilize an available bus width may be identified in a processor pipeline. In response to determining that the multiple memory access instructions target adjacent memory or non-contiguous memory that can fit on a single cache line, the multiple memory access instructions may be replaced within the processor pipeline with one equivalent memory access instruction that utilizes more of the available bus width than either of the replaced memory access instructions.Type: GrantFiled: June 29, 2018Date of Patent: February 28, 2023Assignee: Qualcomm IncorporatedInventors: Harsh Thakker, Thomas Philip Speier, Rodney Wayne Smith, Kevin Jaget, James Norris Dieffenderfer, Michael Morrow, Pritha Ghoshal, Yusuf Cagatay Tekmen, Brian Stempel, Sang Hoon Lee, Manish Garg
-
Patent number: 11061822Abstract: A method, apparatus, and system for reducing pipeline stalls due to address translation misses is presented. An apparatus comprises a memory access instruction pipeline, a translation lookaside buffer coupled to the memory access instruction pipeline, and a TLB miss queue coupled to both the TLB and the memory access instruction pipeline. The TLB miss queue is configured to selectively store a first memory access instruction that has been removed from the memory access instruction pipeline as a result of the first memory access instruction missing in the TLB along with information associated with the first memory access instruction. The TLB miss queue is further configured to reintroduce the first memory access instruction to the memory access instruction pipeline associated with a return of an address translation related to the first memory access instruction.Type: GrantFiled: August 27, 2018Date of Patent: July 13, 2021Assignee: Qualcomm IncorporatedInventors: Pritha Ghoshal, Niket Choudhary, Ravi Rajagopalan, Patrick Eibl, Brian Stempel, David Scott Ray, Thomas Philip Speier
-
Publication number: 20200065260Abstract: A method, apparatus, and system for reducing pipeline stalls due to address translation misses is presented. An apparatus comprises a memory access instruction pipeline, a translation lookaside buffer coupled to the memory access instruction pipeline, and a TLB miss queue coupled to both the TLB and the memory access instruction pipeline. The TLB miss queue is configured to selectively store a first memory access instruction that has been removed from the memory access instruction pipeline as a result of the first memory access instruction missing in the TLB along with information associated with the first memory access instruction. The TLB miss queue is further configured to reintroduce the first memory access instruction to the memory access instruction pipeline associated with a return of an address translation related to the first memory access instruction.Type: ApplicationFiled: August 27, 2018Publication date: February 27, 2020Inventors: Pritha GHOSHAL, Niket CHOUDHARY, Ravi RAJAGOPALAN, Patrick EIBL, Brian STEMPEL, David Scott Ray, Thomas Philip SPEIER
-
Publication number: 20200004550Abstract: Various aspects disclosed herein relate to combining instructions to load data from or store data in memory while processing instructions in a computer processor. More particularly, at least one pattern of multiple memory access instructions that reference a common base register and do not fully utilize an available bus width may be identified in a processor pipeline. In response to determining that the multiple memory access instructions target adjacent memory or non-contiguous memory that can fit on a single cache line, the multiple memory access instructions may be replaced within the processor pipeline with one equivalent memory access instruction that utilizes more of the available bus width than either of the replaced memory access instructions.Type: ApplicationFiled: June 29, 2018Publication date: January 2, 2020Inventors: Harsh THAKKER, Thomas Philip SPEIER, Rodney Wayne SMITH, Kevin JAGET, James Norris DIEFFENDERFER, Michael MORROW, Pritha GHOSHAL, Yusuf Cagatay TEKMEN, Brian STEMPEL, Sang Hoon LEE, Manish GARG
-
Patent number: 10318436Abstract: A translation lookaside buffer (TLB) index valid bit is set in a first line of a virtually indexed, virtually tagged (VIVT) cache. The first line of the VIVT cache is associated with a first TLB entry which stores a virtual address to physical address translation for the first cache line. The TLB index valid bit of the first line is cleared upon determining that the translation is no longer stored in the first TLB entry. An indication of a received invalidation instruction is stored. When a context synchronization instruction is received, the first line of the VIVT cache is cleared based on the TLB index valid bit being cleared and the stored indication of the invalidate instruction.Type: GrantFiled: July 25, 2017Date of Patent: June 11, 2019Assignee: QUALCOMM IncorporatedInventors: William McAvoy, Brian Stempel, Spencer Williams, Robert Douglas Clancy, Michael Scott McIlvaine, Thomas Philip Speier
-
Publication number: 20190034349Abstract: A translation lookaside buffer (TLB) index valid bit is set in a first line of a virtually indexed, virtually tagged (VIVT) cache. The first line of the VIVT cache is associated with a first TLB entry which stores a virtual address to physical address translation for the first cache line. The TLB index valid bit of the first line is cleared upon determining that the translation is no longer stored in the first TLB entry. An indication of a received invalidation instruction is stored. When a context synchronization instruction is received, the first line of the VIVT cache is cleared based on the TLB index valid bit being cleared and the stored indication of the invalidate instruction.Type: ApplicationFiled: July 25, 2017Publication date: January 31, 2019Inventors: William MCAVOY, Brian STEMPEL, Spencer WILLIAMS, Robert Douglas CLANCY, Michael Scott MCILVAINE, Thomas Philip SPEIER
-
Publication number: 20070260854Abstract: A pre-decoder in a variable instruction length processor indicates properties of instructions in pre-decode bits stored in an instruction cache with the instructions. When all the encodings of pre-decode bits associate with one length instruction are defined, a property of an instruction of that length may be indicated by altering the instruction to emulate an instruction of a different length, and encoding the property in the pre-decode bits associated with instructions of the different length. One example of a property that may be so indicated is an undefined instruction.Type: ApplicationFiled: May 4, 2006Publication date: November 8, 2007Inventors: Rodney Smith, Brian Stempel
-
Publication number: 20070204142Abstract: A link stack in a processor is repaired in response to a procedure return address misprediction error. In one example, a link stack for use in a processor is repaired by detecting an error in a procedure return address value retrieved from the link stack and skipping a procedure return address value currently queued for retrieval from the link stack responsive to detecting the error. In one or more embodiments, a link stack circuit comprises a link stack and a link stack pointer. The link stack is configured to store a plurality of procedure return address values. The link stack pointer is configured to skip a procedure return address value currently queued for retrieval from the link stack responsive to an error detected in a procedure return address value previously retrieved from the link stack.Type: ApplicationFiled: February 27, 2006Publication date: August 30, 2007Inventors: James Dieffenderfer, David Mandzak, Rodney Smith, Brian Stempel
-
Publication number: 20070094475Abstract: A processing system may include a memory configured to store data in a plurality of pages, a TLB, and a memory cache including a plurality of cache lines. Each page in the memory may include a plurality of lines of memory. The memory cache may permit, when a virtual address is presented to the cache, a matching cache line to be identified from the plurality of cache lines, the matching cache line having a matching address that matches the virtual address. The memory cache may be configured to permit one or more page attributes of a page located at the matching address to be retrieved from the memory cache and not from the TLB, by further storing in each one of the cache lines a page attribute of the line of data stored in the cache line.Type: ApplicationFiled: October 20, 2005Publication date: April 26, 2007Inventors: Jeffrey Bridges, James Dieffenderfer, Thomas Sartorius, Brian Stempel, Rodney Smith
-
Publication number: 20070067574Abstract: A method of managing cache partitions provides a first pointer for higher priority writes and a second pointer for lower priority writes, and uses the first pointer to delimit the lower priority writes. For example, locked writes have greater priority than unlocked writes, and a first pointer may be used for locked writes, and a second pointer may be used for unlocked writes. The first pointer is advanced responsive to making locked writes, and its advancement thus defines a locked region and an unlocked region. The second pointer is advanced responsive to making unlocked writes. The second pointer also is advanced (or retreated) as needed to prevent it from pointing to locations already traversed by the first pointer. Thus, the pointer delimits the unlocked region and allows the locked region to grow at the expense of the unlocked region.Type: ApplicationFiled: September 21, 2005Publication date: March 22, 2007Inventors: Brian Stempel, James Dieffenderfer, Jeffrey Bridges, Thomas Sartorius, Rodney Smith, Robert Clancy, Victor Augsburg
-
Publication number: 20060294346Abstract: In one or more embodiments, a processor includes a link return stack circuit used for storing branch return addresses, wherein a link return stack controller is configured to determine that one or more entries in the link return stack are invalid as being dependent on a mispredicted branch, and to reset the link return stack to a valid remaining entry, if any. In this manner, branch mispredictions cause dependent entries in the link return stack to be flushed from the link return stack, or otherwise invalidated, while preserving the remaining valid entries, if any, in the link return stack. In at least one embodiment, a branch information queue used for tracking predicted branches is configured to store a marker indicating whether a predicted branch has an associated entry in the link return stack, and it may store an index value identifying the specific, corresponding entry in the link return stack.Type: ApplicationFiled: June 22, 2005Publication date: December 28, 2006Inventors: Brian Stempel, James Dieffenderfer, Thomas Sartorius, Rodney Smith
-
Publication number: 20060277397Abstract: A microprocessor includes two branch history tables, and is configured to use a first one of the branch history tables for predicting branch instructions that are hits in a branch target cache, and to use a second one of the branch history tables for predicting branch instructions that are misses in the branch target cache. As such, the first branch history table is configured to have an access speed matched to that of the branch target cache, so that its prediction information is timely available relative to branch target cache hit detection, which may happen early in the microprocessor's instruction pipeline. The second branch history table thus need only be as fast as is required for providing timely prediction information in association with recognizing branch target cache misses as branch instructions, such as at the instruction decode stage(s) of the instruction pipeline.Type: ApplicationFiled: June 2, 2005Publication date: December 7, 2006Inventors: Thomas Sartorius, Brian Stempel, Jeffrey Bridges, James Dieffenderfer, Rodney Smith
-
Publication number: 20060265572Abstract: A fetch section of a processor comprises an instruction cache and a pipeline of several stages for obtaining instructions. Instructions may cross cache line boundaries. The pipeline stages process two addresses to recover a complete boundary crossing instruction. During such processing, if the second piece of the instruction is not in the cache, the fetch with regard to the first line is invalidated and recycled. On this first pass, processing of the address for the second part of the instruction is treated as a pre-fetch request to load instruction data to the cache from higher level memory, without passing any of that data to the later stages of the processor. When the first line address passes through the fetch stages again, the second line address follows in the normal order, and both pieces of the instruction are can be fetched from the cache and combined in the normal manner.Type: ApplicationFiled: May 18, 2005Publication date: November 23, 2006Inventors: Brian Stempel, Jeffrey Bridges, Rodney Smith, Thomas Sartorius
-
Publication number: 20060265573Abstract: A method and apparatus for caching instructions for a processor having multiple operating states. At least two of the operating states of the processor supporting different instruction sets. A block of instructions may be retrieved from memory while the processor is operating in one of the states. The instructions may be pre-decoded in accordance with said one of the states and loaded into cache. The processor, or another entity, may be used to determine whether the current state of the processor is the same as said one of the states used to pre-decode the instructions when one of the pre-decoded instructions in the cache is needed by the processor.Type: ApplicationFiled: May 18, 2005Publication date: November 23, 2006Inventors: Rodney Smith, Brian Stempel
-
Publication number: 20060200686Abstract: A processor capable of fetching and executing variable length instructions is described having instructions of at least two lengths. The processor operates in multiple modes. One of the modes restricts instructions that can be fetched and executed to the longer length instructions. An instruction cache is used for storing variable length instructions and their associated predecode bit fields in an instruction cache line and storing the instruction address and processor operating mode state information at the time of the fetch in a tag line. The processor operating mode state information indicates the program specified mode of operation of the processor. The processor fetches instructions from the instruction cache for execution.Type: ApplicationFiled: March 4, 2005Publication date: September 7, 2006Inventors: Brian Stempel, James Dieffenderfer, Jeffrey Bridges, Rodney Smith, Thomas Sartorius
-
Publication number: 20060200655Abstract: A pipelined processor comprises an instruction cache (iCache), a branch target address cache (BTAC), and processing stages, including a stage to fetch from the iCache and the BTAC. To compensate for the number of cycles needed to fetch a branch target address from the BTAC, the fetch from the BTAC leads the fetch of a branch instruction from the iCache by an amount related to the cycles needed to fetch from the BTAC. Disclosed examples either decrement a write address of the BTAC or increment a fetch address of the BTAC, by an amount essentially corresponding to one less than the cycles needed for a BTAC fetch.Type: ApplicationFiled: March 4, 2005Publication date: September 7, 2006Inventors: Rodney Smith, Brian Stempel, James Dieffenderfer, Jeffrey Bridges, Thomas Sartorius
-
Publication number: 20060149981Abstract: In a pipelined processor, a pre-decoder in advance of an instruction cache calculates the branch target address (BTA) of PC-relative and absolute address branch instructions. The pre-decoder compares the BTA with the branch instruction address (BIA) to determine whether the target and instruction are in the same memory page. A branch target same page (BTSP) bit indicating this is written to the cache and associated with the instruction. When the branch is executed and evaluated as taken, a TLB access to check permission attributes for the BTA is suppressed if the BTA is in the same page as the BIA, as indicated by the BTSP bit. This reduces power consumption as the TLB access is suppressed and the BTA/BIA comparison is only performed once, when the branch instruction is first fetched. Additionally, the pre-decoder removes the BTA/BIA comparison from the BTA generation and selection critical path.Type: ApplicationFiled: December 2, 2004Publication date: July 6, 2006Inventors: James Dieffenderfer, Thomas Sartorius, Rodney Smith, Brian Stempel
-
Publication number: 20060123326Abstract: In a pipelined processor where instructions are pre-decoded prior to being stored in a cache, an incorrectly pre-decoded instruction is detected during execution in the pipeline. The corresponding instruction is invalidated in the cache, and the instruction is forced to evaluate as a branch instruction. In particular, the branch instruction is evaluated as “mispredicted not taken” with a branch target address of the incorrectly pre-decoded instruction's address. This, with the invalidated cache line, causes the incorrectly pre-decoded instruction to be re-fetched from memory with a precise address. The re-fetched instruction is then correctly pre-decoded, written to the cache, and executed.Type: ApplicationFiled: November 22, 2004Publication date: June 8, 2006Inventors: Rodney Smith, Brian Stempel, James Dieffenderfer, Jeffrey Bridges, Thomas Sartorius
-
Publication number: 20060048011Abstract: A method and system for monitoring the real-time of software running on a microprocessor system. Debug hardware is used to select a range of instructions or events to be monitored by a performance monitor interval with the microprocessor system. A comparison is made between each event and start and stop events are identified in the debug hardware. The performance monitor is enabled by the debug hardware, when events occur within the range defined by the debug hardware. Use of the debug hardware for enabling performance monitoring avoids any overhead associated with generating interrupts, or additional code in the application program.Type: ApplicationFiled: August 26, 2004Publication date: March 2, 2006Applicant: International Business Machines CorporationInventors: James Dieffenderfer, Sanjay Patel, Brian Stempel
-
Publication number: 20050216703Abstract: A method and apparatus for executing instructions in a pipeline processor. The method decreases the latency between an instruction cache and a pipeline processor when bubbles occur in the processing stream due to an execution of a branch correction, or when an interrupt changes the sequence of an instruction stream. The latency is reduced when a decode stage for detecting branch prediction and a related instruction queue location have invalid data representing a bubble in the processing stream. Instructions for execution are inserted in parallel into the decode stage and instruction queue, thereby reducing by one cycle time the length of the pipeline stage.Type: ApplicationFiled: March 26, 2004Publication date: September 29, 2005Applicant: International Business Machines CorporationInventors: James Dieffenderfer, Richard Doing, Brian Stempel, Steven Testa, Kenichi Tsuchiya