Patents by Inventor Brian Stempel

Brian Stempel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Combining load or store instructions

Patent number: 11593117

Abstract: Various aspects disclosed herein relate to combining instructions to load data from or store data in memory while processing instructions in a computer processor. More particularly, at least one pattern of multiple memory access instructions that reference a common base register and do not fully utilize an available bus width may be identified in a processor pipeline. In response to determining that the multiple memory access instructions target adjacent memory or non-contiguous memory that can fit on a single cache line, the multiple memory access instructions may be replaced within the processor pipeline with one equivalent memory access instruction that utilizes more of the available bus width than either of the replaced memory access instructions.

Type: Grant

Filed: June 29, 2018

Date of Patent: February 28, 2023

Assignee: Qualcomm Incorporated

Inventors: Harsh Thakker, Thomas Philip Speier, Rodney Wayne Smith, Kevin Jaget, James Norris Dieffenderfer, Michael Morrow, Pritha Ghoshal, Yusuf Cagatay Tekmen, Brian Stempel, Sang Hoon Lee, Manish Garg
Method, apparatus, and system for reducing pipeline stalls due to address translation misses

Patent number: 11061822

Abstract: A method, apparatus, and system for reducing pipeline stalls due to address translation misses is presented. An apparatus comprises a memory access instruction pipeline, a translation lookaside buffer coupled to the memory access instruction pipeline, and a TLB miss queue coupled to both the TLB and the memory access instruction pipeline. The TLB miss queue is configured to selectively store a first memory access instruction that has been removed from the memory access instruction pipeline as a result of the first memory access instruction missing in the TLB along with information associated with the first memory access instruction. The TLB miss queue is further configured to reintroduce the first memory access instruction to the memory access instruction pipeline associated with a return of an address translation related to the first memory access instruction.

Type: Grant

Filed: August 27, 2018

Date of Patent: July 13, 2021

Assignee: Qualcomm Incorporated

Inventors: Pritha Ghoshal, Niket Choudhary, Ravi Rajagopalan, Patrick Eibl, Brian Stempel, David Scott Ray, Thomas Philip Speier
METHOD, APPARATUS, AND SYSTEM FOR REDUCING PIPELINE STALLS DUE TO ADDRESS TRANSLATION MISSES

Publication number: 20200065260

Abstract: A method, apparatus, and system for reducing pipeline stalls due to address translation misses is presented. An apparatus comprises a memory access instruction pipeline, a translation lookaside buffer coupled to the memory access instruction pipeline, and a TLB miss queue coupled to both the TLB and the memory access instruction pipeline. The TLB miss queue is configured to selectively store a first memory access instruction that has been removed from the memory access instruction pipeline as a result of the first memory access instruction missing in the TLB along with information associated with the first memory access instruction. The TLB miss queue is further configured to reintroduce the first memory access instruction to the memory access instruction pipeline associated with a return of an address translation related to the first memory access instruction.

Type: Application

Filed: August 27, 2018

Publication date: February 27, 2020

Inventors: Pritha GHOSHAL, Niket CHOUDHARY, Ravi RAJAGOPALAN, Patrick EIBL, Brian STEMPEL, David Scott Ray, Thomas Philip SPEIER
COMBINING LOAD OR STORE INSTRUCTIONS

Publication number: 20200004550

Abstract: Various aspects disclosed herein relate to combining instructions to load data from or store data in memory while processing instructions in a computer processor. More particularly, at least one pattern of multiple memory access instructions that reference a common base register and do not fully utilize an available bus width may be identified in a processor pipeline. In response to determining that the multiple memory access instructions target adjacent memory or non-contiguous memory that can fit on a single cache line, the multiple memory access instructions may be replaced within the processor pipeline with one equivalent memory access instruction that utilizes more of the available bus width than either of the replaced memory access instructions.

Type: Application

Filed: June 29, 2018

Publication date: January 2, 2020

Inventors: Harsh THAKKER, Thomas Philip SPEIER, Rodney Wayne SMITH, Kevin JAGET, James Norris DIEFFENDERFER, Michael MORROW, Pritha GHOSHAL, Yusuf Cagatay TEKMEN, Brian STEMPEL, Sang Hoon LEE, Manish GARG
Precise invalidation of virtually tagged caches

Patent number: 10318436

Abstract: A translation lookaside buffer (TLB) index valid bit is set in a first line of a virtually indexed, virtually tagged (VIVT) cache. The first line of the VIVT cache is associated with a first TLB entry which stores a virtual address to physical address translation for the first cache line. The TLB index valid bit of the first line is cleared upon determining that the translation is no longer stored in the first TLB entry. An indication of a received invalidation instruction is stored. When a context synchronization instruction is received, the first line of the VIVT cache is cleared based on the TLB index valid bit being cleared and the stored indication of the invalidate instruction.

Type: Grant

Filed: July 25, 2017

Date of Patent: June 11, 2019

Assignee: QUALCOMM Incorporated

Inventors: William McAvoy, Brian Stempel, Spencer Williams, Robert Douglas Clancy, Michael Scott McIlvaine, Thomas Philip Speier
PRECISE INVALIDATION OF VIRTUALLY TAGGED CACHES

Publication number: 20190034349

Abstract: A translation lookaside buffer (TLB) index valid bit is set in a first line of a virtually indexed, virtually tagged (VIVT) cache. The first line of the VIVT cache is associated with a first TLB entry which stores a virtual address to physical address translation for the first cache line. The TLB index valid bit of the first line is cleared upon determining that the translation is no longer stored in the first TLB entry. An indication of a received invalidation instruction is stored. When a context synchronization instruction is received, the first line of the VIVT cache is cleared based on the TLB index valid bit being cleared and the stored indication of the invalidate instruction.

Type: Application

Filed: July 25, 2017

Publication date: January 31, 2019

Inventors: William MCAVOY, Brian STEMPEL, Spencer WILLIAMS, Robert Douglas CLANCY, Michael Scott MCILVAINE, Thomas Philip SPEIER
PRE-DECODING VARIABLE LENGTH INSTRUCTIONS

Publication number: 20070260854

Abstract: A pre-decoder in a variable instruction length processor indicates properties of instructions in pre-decode bits stored in an instruction cache with the instructions. When all the encodings of pre-decode bits associate with one length instruction are defined, a property of an instruction of that length may be indicated by altering the instruction to emulate an instruction of a different length, and encoding the property in the pre-decode bits associated with instructions of the different length. One example of a property that may be so indicated is an undefined instruction.

Type: Application

Filed: May 4, 2006

Publication date: November 8, 2007

Inventors: Rodney Smith, Brian Stempel
Method and apparatus for repairing a link stack

Publication number: 20070204142

Abstract: A link stack in a processor is repaired in response to a procedure return address misprediction error. In one example, a link stack for use in a processor is repaired by detecting an error in a procedure return address value retrieved from the link stack and skipping a procedure return address value currently queued for retrieval from the link stack responsive to detecting the error. In one or more embodiments, a link stack circuit comprises a link stack and a link stack pointer. The link stack is configured to store a plurality of procedure return address values. The link stack pointer is configured to skip a procedure return address value currently queued for retrieval from the link stack responsive to an error detected in a procedure return address value previously retrieved from the link stack.

Type: Application

Filed: February 27, 2006

Publication date: August 30, 2007

Inventors: James Dieffenderfer, David Mandzak, Rodney Smith, Brian Stempel
Caching memory attribute indicators with cached memory data field

Publication number: 20070094475

Abstract: A processing system may include a memory configured to store data in a plurality of pages, a TLB, and a memory cache including a plurality of cache lines. Each page in the memory may include a plurality of lines of memory. The memory cache may permit, when a virtual address is presented to the cache, a matching cache line to be identified from the plurality of cache lines, the matching cache line having a matching address that matches the virtual address. The memory cache may be configured to permit one or more page attributes of a page located at the matching address to be retrieved from the memory cache and not from the TLB, by further storing in each one of the cache lines a page attribute of the line of data stored in the cache line.

Type: Application

Filed: October 20, 2005

Publication date: April 26, 2007

Inventors: Jeffrey Bridges, James Dieffenderfer, Thomas Sartorius, Brian Stempel, Rodney Smith
Method and apparatus for managing cache partitioning

Publication number: 20070067574

Abstract: A method of managing cache partitions provides a first pointer for higher priority writes and a second pointer for lower priority writes, and uses the first pointer to delimit the lower priority writes. For example, locked writes have greater priority than unlocked writes, and a first pointer may be used for locked writes, and a second pointer may be used for unlocked writes. The first pointer is advanced responsive to making locked writes, and its advancement thus defines a locked region and an unlocked region. The second pointer is advanced responsive to making unlocked writes. The second pointer also is advanced (or retreated) as needed to prevent it from pointing to locations already traversed by the first pointer. Thus, the pointer delimits the unlocked region and allows the locked region to grow at the expense of the unlocked region.

Type: Application

Filed: September 21, 2005

Publication date: March 22, 2007

Inventors: Brian Stempel, James Dieffenderfer, Jeffrey Bridges, Thomas Sartorius, Rodney Smith, Robert Clancy, Victor Augsburg
Method and apparatus for managing a link return stack

Publication number: 20060294346

Abstract: In one or more embodiments, a processor includes a link return stack circuit used for storing branch return addresses, wherein a link return stack controller is configured to determine that one or more entries in the link return stack are invalid as being dependent on a mispredicted branch, and to reset the link return stack to a valid remaining entry, if any. In this manner, branch mispredictions cause dependent entries in the link return stack to be flushed from the link return stack, or otherwise invalidated, while preserving the remaining valid entries, if any, in the link return stack. In at least one embodiment, a branch information queue used for tracking predicted branches is configured to store a marker indicating whether a predicted branch has an associated entry in the link return stack, and it may store an index value identifying the specific, corresponding entry in the link return stack.

Type: Application

Filed: June 22, 2005

Publication date: December 28, 2006

Inventors: Brian Stempel, James Dieffenderfer, Thomas Sartorius, Rodney Smith
Method and apparatus for predicting branch instructions

Publication number: 20060277397

Abstract: A microprocessor includes two branch history tables, and is configured to use a first one of the branch history tables for predicting branch instructions that are hits in a branch target cache, and to use a second one of the branch history tables for predicting branch instructions that are misses in the branch target cache. As such, the first branch history table is configured to have an access speed matched to that of the branch target cache, so that its prediction information is timely available relative to branch target cache hit detection, which may happen early in the microprocessor's instruction pipeline. The second branch history table thus need only be as fast as is required for providing timely prediction information in association with recognizing branch target cache misses as branch instructions, such as at the instruction decode stage(s) of the instruction pipeline.

Type: Application

Filed: June 2, 2005

Publication date: December 7, 2006

Inventors: Thomas Sartorius, Brian Stempel, Jeffrey Bridges, James Dieffenderfer, Rodney Smith
Caching instructions for a multiple-state processor

Publication number: 20060265573

Abstract: A method and apparatus for caching instructions for a processor having multiple operating states. At least two of the operating states of the processor supporting different instruction sets. A block of instructions may be retrieved from memory while the processor is operating in one of the states. The instructions may be pre-decoded in accordance with said one of the states and loaded into cache. The processor, or another entity, may be used to determine whether the current state of the processor is the same as said one of the states used to pre-decode the instructions when one of the pre-decoded instructions in the cache is needed by the processor.

Type: Application

Filed: May 18, 2005

Publication date: November 23, 2006

Inventors: Rodney Smith, Brian Stempel
Handling cache miss in an instruction crossing a cache line boundary

Publication number: 20060265572

Abstract: A fetch section of a processor comprises an instruction cache and a pipeline of several stages for obtaining instructions. Instructions may cross cache line boundaries. The pipeline stages process two addresses to recover a complete boundary crossing instruction. During such processing, if the second piece of the instruction is not in the cache, the fetch with regard to the first line is invalidated and recycled. On this first pass, processing of the address for the second part of the instruction is treated as a pre-fetch request to load instruction data to the cache from higher level memory, without passing any of that data to the later stages of the processor. When the first line address passes through the fetch stages again, the second line address follows in the normal order, and both pieces of the instruction are can be fetched from the cache and combined in the normal manner.

Type: Application

Filed: May 18, 2005

Publication date: November 23, 2006

Inventors: Brian Stempel, Jeffrey Bridges, Rodney Smith, Thomas Sartorius
Forward looking branch target address caching

Publication number: 20060200655

Abstract: A pipelined processor comprises an instruction cache (iCache), a branch target address cache (BTAC), and processing stages, including a stage to fetch from the iCache and the BTAC. To compensate for the number of cycles needed to fetch a branch target address from the BTAC, the fetch from the BTAC leads the fetch of a branch instruction from the iCache by an amount related to the cycles needed to fetch from the BTAC. Disclosed examples either decrement a write address of the BTAC or increment a fetch address of the BTAC, by an amount essentially corresponding to one less than the cycles needed for a BTAC fetch.

Type: Application

Filed: March 4, 2005

Publication date: September 7, 2006

Inventors: Rodney Smith, Brian Stempel, James Dieffenderfer, Jeffrey Bridges, Thomas Sartorius
Power saving methods and apparatus to selectively enable cache bits based on known processor state

Publication number: 20060200686

Abstract: A processor capable of fetching and executing variable length instructions is described having instructions of at least two lengths. The processor operates in multiple modes. One of the modes restricts instructions that can be fetched and executed to the longer length instructions. An instruction cache is used for storing variable length instructions and their associated predecode bit fields in an instruction cache line and storing the instruction address and processor operating mode state information at the time of the fetch in a tag line. The processor operating mode state information indicates the program specified mode of operation of the processor. The processor fetches instructions from the instruction cache for execution.

Type: Application

Filed: March 4, 2005

Publication date: September 7, 2006

Inventors: Brian Stempel, James Dieffenderfer, Jeffrey Bridges, Rodney Smith, Thomas Sartorius
Translation lookaside buffer (TLB) suppression for intra-page program counter relative or absolute address branch instructions

Publication number: 20060149981

Abstract: In a pipelined processor, a pre-decoder in advance of an instruction cache calculates the branch target address (BTA) of PC-relative and absolute address branch instructions. The pre-decoder compares the BTA with the branch instruction address (BIA) to determine whether the target and instruction are in the same memory page. A branch target same page (BTSP) bit indicating this is written to the cache and associated with the instruction. When the branch is executed and evaluated as taken, a TLB access to check permission attributes for the BTA is suppressed if the BTA is in the same page as the BIA, as indicated by the BTSP bit. This reduces power consumption as the TLB access is suppressed and the BTA/BIA comparison is only performed once, when the branch instruction is first fetched. Additionally, the pre-decoder removes the BTA/BIA comparison from the BTA generation and selection critical path.

Type: Application

Filed: December 2, 2004

Publication date: July 6, 2006

Inventors: James Dieffenderfer, Thomas Sartorius, Rodney Smith, Brian Stempel
Pre-decode error handling via branch correction

Publication number: 20060123326

Abstract: In a pipelined processor where instructions are pre-decoded prior to being stored in a cache, an incorrectly pre-decoded instruction is detected during execution in the pipeline. The corresponding instruction is invalidated in the cache, and the instruction is forced to evaluate as a branch instruction. In particular, the branch instruction is evaluated as “mispredicted not taken” with a branch target address of the incorrectly pre-decoded instruction's address. This, with the invalidated cache line, causes the incorrectly pre-decoded instruction to be re-fetched from memory with a precise address. The re-fetched instruction is then correctly pre-decoded, written to the cache, and executed.

Type: Application

Filed: November 22, 2004

Publication date: June 8, 2006

Inventors: Rodney Smith, Brian Stempel, James Dieffenderfer, Jeffrey Bridges, Thomas Sartorius
Performance profiling of microprocessor systems using debug hardware and performance monitor

Publication number: 20060048011

Abstract: A method and system for monitoring the real-time of software running on a microprocessor system. Debug hardware is used to select a range of instructions or events to be monitored by a performance monitor interval with the microprocessor system. A comparison is made between each event and start and stop events are identified in the debug hardware. The performance monitor is enabled by the debug hardware, when events occur within the range defined by the debug hardware. Use of the debug hardware for enabling performance monitoring avoids any overhead associated with generating interrupts, or additional code in the application program.

Type: Application

Filed: August 26, 2004

Publication date: March 2, 2006

Applicant: International Business Machines Corporation

Inventors: James Dieffenderfer, Sanjay Patel, Brian Stempel
Apparatus and method for decreasing the latency between an instruction cache and a pipeline processor

Publication number: 20050216703

Abstract: A method and apparatus for executing instructions in a pipeline processor. The method decreases the latency between an instruction cache and a pipeline processor when bubbles occur in the processing stream due to an execution of a branch correction, or when an interrupt changes the sequence of an instruction stream. The latency is reduced when a decode stage for detecting branch prediction and a related instruction queue location have invalid data representing a bubble in the processing stream. Instructions for execution are inserted in parallel into the decode stage and instruction queue, thereby reducing by one cycle time the length of the pipeline stage.

Type: Application

Filed: March 26, 2004

Publication date: September 29, 2005

Applicant: International Business Machines Corporation

Inventors: James Dieffenderfer, Richard Doing, Brian Stempel, Steven Testa, Kenichi Tsuchiya