Patents by Inventor Mark Luttrell

Mark Luttrell has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

OBSERVATION OF DATA IN PERSISTENT MEMORY

Publication number: 20140365734

Abstract: Systems and methods for reliably using data storage media. Multiple processors are configured to access a persistent memory. For a given data block corresponding to a write access request from a first processor to the persistent memory, a cache controller prevents any read access of a copy of the given data block in an associated cache. The cache controller prevents any read access while detecting an acknowledgment that the given data block is stored in the persistent memory is not yet received. Until the acknowledgment is received, the cache controller allows write access of the copy of the given data block in the associated cache only for a thread in the first processor that originally sent the write access request. The cache controller invalidates any copy of the given data block in any cache levels below the associated cache.

Type: Application

Filed: June 10, 2013

Publication date: December 11, 2014

Inventors: William H. Bridge, JR., Paul Loewenstein, Mark A. Luttrell
LOAD-MONITOR MWAIT

Publication number: 20140075163

Abstract: Techniques are disclosed relating to suspending execution of a processor thread while monitoring for a write to a specified memory location. An execution subsystem may be configured to perform a load instruction that causes the processor to retrieve data from a specified memory location and atomically begin monitoring for a write to the specified location. The load instruction may be a load-monitor instruction. The execution subsystem may be further configured to perform a wait instruction that causes the processor to suspend execution of a processor thread during at least a portion of an interval specified by the wait instruction and to resume execution of the processor thread at the end of the interval. The wait instruction may be a monitor-wait instruction. The processor may be further configured to resume execution of the processor thread in response to detecting a write to a memory location specified by a previous monitor instruction.

Type: Application

Filed: September 7, 2012

Publication date: March 13, 2014

Inventors: Paul N. Loewenstein, Mark A. Luttrell, Paul J. Jordan
MITIGATION OF THREAD HOGS ON A THREADED PROCESSOR USING A GENERAL LOAD/STORE TIMEOUT COUNTER

Publication number: 20130297910

Abstract: Systems and methods for efficient thread arbitration in a threaded processor with dynamic resource allocation. A processor includes a resource shared by multiple threads. The resource includes entries which may be allocated for use by any thread. Control logic detects long latency instructions. Long latency instructions have a latency greater than a given threshold. One example is a load instruction that has a read-after-write (RAW) data dependency on a store instruction that misses a last-level data cache. The long latency instruction or an immediately younger instruction is selected for replay for an associated thread. A pipeline flush and replay for the associated thread begins with the selected instruction. Instructions younger than the long latency instruction are held at a given pipeline stage until the long latency instruction completes. During replay, this hold prevents resources from being allocated to the associated thread while the long latency instruction is being serviced.

Type: Application

Filed: May 3, 2012

Publication date: November 7, 2013

Inventors: Jared C. Smolens, Robert T. Golla, Mark A. Luttrell, Paul J. Jordan
MITIGATION OF THREAD HOGS ON A THREADED PROCESSOR

Publication number: 20130290675

Abstract: Systems and methods for efficient thread arbitration in a threaded processor with dynamic resource allocation. A processor includes a resource shared by multiple threads. The resource includes an array with multiple entries, each of which may be allocated for use by any thread. Control logic detects a load miss to memory, wherein the miss is associated with a latency greater than a given threshold. The load instruction or an immediately younger instruction is selected for replay for an associated thread. A pipeline flush and replay for the associated thread begins with the selected instruction. Instructions younger than the load instruction are held at a given pipeline stage until the load instruction completes. During replay, this hold prevents resources from being allocated to the associated thread while the load instruction is being serviced.

Type: Application

Filed: April 26, 2012

Publication date: October 31, 2013

Inventors: Yuan C. Chou, Robert T. Golla, Mark A. Luttrell
System and method to invalidate obsolete address translations

Patent number: 8412911

Abstract: A system and method for invalidating obsolete virtual/real address to physical address translations may employ translation lookaside buffers to cache translations. TLB entries may be invalidated in response to changes in the virtual memory space, and thus may need to be demapped. A non-cacheable unit (NCU) residing on a processor may be configured to receive and manage a global TLB demap request from a thread executing on a core residing on the processor. The NCU may send the request to local cores and/or to NCUs of external processors in a multiprocessor system using a hardware instruction to broadcast to all cores and/or processors or to multicast to designated cores and/or processors. The NCU may track completion of the demap operation across the cores and/or processors using one or more counters, and may send an acknowledgement to the initiator of the demap request when the global demap request has been satisfied.

Type: Grant

Filed: June 29, 2009

Date of Patent: April 2, 2013

Assignee: Oracle America, Inc.

Inventors: Gregory F. Grohoski, Paul J. Jordan, Mark A. Luttrell, Zeid Hartuon Samoail
System and method to manage address translation requests

Patent number: 8301865

Abstract: A system and method for servicing translation lookaside buffer (TLB) misses may manage separate input and output pipelines within a memory management unit. A pending request queue (PRQ) in the input pipeline may include an instruction-related portion storing entries for instruction TLB (ITLB) misses and a data-related portion storing entries for potential or actual data TLB (DTLB) misses. A DTLB PRQ entry may be allocated to each load/store instruction selected from the pick queue. The system may select an ITLB- or DTLB-related entry for servicing dependent on prior PRQ entry selection(s). A corresponding entry may be held in a translation table entry return queue (TTERQ) in the output pipeline until a matching address translation is received from system memory. PRQ and/or TTERQ entries may be deallocated when a corresponding TLB miss is serviced. PRQ and/or TTERQ entries associated with a thread may be deallocated in response to a thread flush.

Type: Grant

Filed: June 29, 2009

Date of Patent: October 30, 2012

Assignee: Oracle America, Inc.

Inventors: Gregory F. Grohoski, Paul J. Jordan, Mark A. Luttrell, Zeid Hartuon Samoail, Robert T. Golla
Store prefetching via store queue lookahead

Patent number: 8230177

Abstract: Systems and methods for efficient handling of store misses. A processor comprises a store queue that stores data for committed store instructions. Coupled to the store queue is a cache responsible for ensuring consistent ordering of store operations for all consumers, which may be accomplished by maintaining a corresponding cache line be in an exclusive state before executing a store operation. In response to a first committed store instruction missing in the cache, the store queue is configured to convey to the cache a second entry of the plurality of queue entries as a speculative prefetch instruction. This second entry corresponds to a committed store instruction that follows in program order the first committed store instruction of a given thread. If the prefetch instruction misses in the cache, the latency for acquiring a corresponding cache line overlaps with the latency of the first store instruction.

Type: Grant

Filed: May 28, 2009

Date of Patent: July 24, 2012

Assignee: Oracle America, Inc.

Inventor: Mark A. Luttrell
Data prefetcher that adjusts prefetch stream length based on confidence

Patent number: 8166251

Abstract: In an embodiment, a processor includes a data cache and a prefetch unit coupled to the data cache. The prefetch unit is configured to identify a prefetch stream in cache misses from the data cache, and the prefetch unit is configured to issue prefetches predicted by the prefetch stream to prefetch data into the data cache. More particularly, the prefetch unit implements one or more stream engines that generate prefetches for respective prefetch streams. Each stream engine is configured to maintain limit data that indicates a number of prefetches that are permitted to be outstanding beyond a most recent demand access. The stream engine is configured to increase the limit responsive to the number of demand accesses that consume prefetched data at least equaling the limit.

Type: Grant

Filed: April 20, 2009

Date of Patent: April 24, 2012

Assignee: Oracle America, Inc.

Inventor: Mark A. Luttrell
Data prefetcher

Patent number: 8140769

Abstract: In an embodiment, a processor includes a data cache and a prefetch unit coupled to the data cache. The prefetch unit is configured to detect one or more prefetch streams corresponding to load operations that miss the data cache, and includes a memory configured to store data corresponding to potential prefetch streams. The prefetch unit is configured to confirm a prefetch stream in response to N or more demand accesses to addresses in the prefetch stream, where N is a positive integer greater than one and is dependent on a prefetch pattern being detected. The prefetch unit comprises a plurality of stream engines, each stream engine configured to generate prefetches for a different prefetch stream assigned to that stream engine. The prefetch unit is configured to assign the confirmed prefetch stream to one of the plurality of stream engines.

Type: Grant

Filed: April 20, 2009

Date of Patent: March 20, 2012

Assignee: Oracle America, Inc.

Inventor: Mark A. Luttrell
Branch misprediction recovery mechanism for microprocessors

Patent number: 8099586

Abstract: A system and method for reducing branch misprediction penalty. In response to detecting a mispredicted branch instruction, circuitry within a microprocessor identifies a predetermined condition prior to retirement of the branch instruction. Upon identifying this condition, the entire corresponding pipeline is flushed prior to retirement of the branch instruction, and instruction fetch is started at a corresponding address of an oldest instruction in the pipeline immediately prior to the flushing of the pipeline. The correct outcome is stored prior to the pipeline flush. In order to distinguish the mispredicted branch from other instructions, identification information may be stored alongside the correct outcome. One example of the predetermined condition being satisfied is in response to a timer reaching a predetermined threshold value, wherein the timer begins incrementing in response to the mispredicted branch detection and resets at retirement of the mispredicted branch.

Type: Grant

Filed: December 30, 2008

Date of Patent: January 17, 2012

Assignee: Oracle America, Inc.

Inventors: Yuan C. Chou, Robert T. Golla, Mark A. Luttrell, Paul J. Jordan, Manish Shah
Load/store ordering in a threaded out-of-order processor

Patent number: 8099566

Abstract: Systems and methods for efficient load-store ordering. A processor comprises a store buffer that includes an array. The store buffer dynamically allocates any entry of the array for an out-of-order (o-o-o) issued store instruction independent of a corresponding thread. Circuitry within the store buffer determines a first set of entries of the array entries that have store instructions older in program order than a particular load instruction, wherein the store instructions have a same thread identifier and address as the load instruction. From the first set, the logic locates a single final match entry of the first set corresponding to the youngest store instruction of the first set, which may be used for read-after-write (RAW) hazard detection.

Type: Grant

Filed: May 15, 2009

Date of Patent: January 17, 2012

Assignee: Oracle America, Inc.

Inventor: Mark A. Luttrell
Dynamically allocated store queue for a multithreaded processor

Patent number: 8006075

Abstract: Systems and methods for storage of writes to memory corresponding to multiple threads. A processor comprises a store queue, wherein the queue dynamically allocates a current entry for a committed store instruction in which entries of the array may be allocated out of program order. For a given thread, the store queue conveys store data to a memory in program order. The queue is further configured to identify an entry of the plurality of entries that corresponds to an oldest committed store instruction for a given thread and determine a next entry of the array that corresponds to a next committed store instruction in program order following the oldest committed store instruction of the given thread, wherein said next entry includes data identifying the entry. The queue marks an entry as unfilled upon successful conveying of store data to the memory.

Type: Grant

Filed: May 21, 2009

Date of Patent: August 23, 2011

Assignee: Oracle America, Inc.

Inventor: Mark A. Luttrell
Device for misaligned atomics for a highly-threaded x86 processor

Patent number: 7996632

Abstract: A multithreaded processor with a banked cache is provided. The instruction set includes at least one atomic operation which is executed in the L2 cache if the atomic memory address source data is aligned. The core executing the instruction determines whether the atomic memory address source data is aligned. If it is aligned, the atomic memory address is sent to the bank that contains the atomic memory address source data, and the operation is executed in the bank. In one embodiment, if the instruction is mis-aligned, the operation is executed in the core.

Type: Grant

Filed: December 22, 2006

Date of Patent: August 9, 2011

Assignee: Oracle America, Inc.

Inventors: Greg F. Grohoski, Mark A. Luttrell, Manish Shah
System and Method to Manage Address Translation Requests

Publication number: 20100332787

Abstract: A system and method for servicing translation lookaside buffer (TLB) misses may manage separate input and output pipelines within a memory management unit. A pending request queue (PRQ) in the input pipeline may include an instruction-related portion storing entries for instruction TLB (ITLB) misses and a data-related portion storing entries for potential or actual data TLB (DTLB) misses. A DTLB PRQ entry may be allocated to each load/store instruction selected from the pick queue. The system may select an ITLB- or DTLB-related entry for servicing dependent on prior PRQ entry selection(s). A corresponding entry may be held in a translation table entry return queue (TTERQ) in the output pipeline until a matching address translation is received from system memory. PRQ and/or TTERQ entries may be deallocated when a corresponding TLB miss is serviced. PRQ and/or TTERQ entries associated with a thread may be deallocated in response to a thread flush.

Type: Application

Filed: June 29, 2009

Publication date: December 30, 2010

Inventors: Gregory F. Grohoski, Paul J. Jordan, Mark A. Luttrell, Zeid Hartuon Samoail, Robert T. Golla
System and Method to Invalidate Obsolete Address Translations

Publication number: 20100332786

Abstract: A system and method for invalidating obsolete virtual/real address to physical address translations may employ translation lookaside buffers to cache translations. TLB entries may be invalidated in response to changes in the virtual memory space, and thus may need to be demapped. A non-cacheable unit (NCU) residing on a processor may be configured to receive and manage a global TLB demap request from a thread executing on a core residing on the processor. The NCU may send the request to local cores and/or to NCUs of external processors in a multiprocessor system using a hardware instruction to broadcast to all cores and/or processors or to multicast to designated cores and/or processors. The NCU may track completion of the demap operation across the cores and/or processors using one or more counters, and may send an acknowledgement to the initiator of the demap request when the global demap request has been satisfied.

Type: Application

Filed: June 29, 2009

Publication date: December 30, 2010

Inventors: Gregory F. Grohoski, Paul J. Jordan, Mark A. Luttrell, Zeid Hartuon Samoail
UNIFIED HIGH-FREQUENCY OUT-OF-ORDER PICK QUEUE WITH SUPPORT FOR SPECULATIVE INSTRUCTIONS

Publication number: 20100332804

Abstract: Systems and methods for efficient picking of instructions for out-of-order issue and execution in a processor. In one embodiment, a processor comprises a unified pick queue that is dynamically allocated. Each entry is configured to store age and dependency information relative to other decoded instructions. Also, each entry stores a picked field, which when asserted indicates the decoded instruction has already been picked for out-of-order issue and execution. When asserted, a trigger field indicates a result of a corresponding decoded instruction will be available a predetermined number of clock cycles afterward. A younger instruction dependent on a result of an older instruction is ready to be picked before the result of the older instruction is available. In this case, the older instruction has asserted picked and trigger fields.

Type: Application

Filed: June 29, 2009

Publication date: December 30, 2010

Inventors: Robert T. Golla, Matthew B. Smittle, Mark A. Luttrell, Xiang Shan Li
STORE PREFETCHING VIA STORE QUEUE LOOKAHEAD

Publication number: 20100306477

Abstract: Systems and methods for efficient handling of store misses. A processor comprises a store queue that stores data for committed store instructions. Coupled to the store queue is a cache responsible for ensuring consistent ordering of store operations for all consumers, which may be accomplished by maintaining a corresponding cache line be in an exclusive state before executing a store operation. In response to a first committed store instruction missing in the cache, the store queue is configured to convey to the cache a second entry of the plurality of queue entries as a speculative prefetch instruction. This second entry corresponds to a committed store instruction that follows in program order the first committed store instruction of a given thread. If the prefetch instruction misses in the cache, the latency for acquiring a corresponding cache line overlaps with the latency of the first store instruction.

Type: Application

Filed: May 28, 2009

Publication date: December 2, 2010

Inventor: Mark A. Luttrell
DYNAMICALLY ALLOCATED STORE QUEUE FOR A MULTITHREADED PROCESSOR

Publication number: 20100299508

Abstract: Systems and methods for storage of writes to memory corresponding to multiple threads. A processor comprises a store queue, wherein the queue dynamically allocates a current entry for a committed store instruction in which entries of the array may be allocated out of program order. For a given thread, the store queue conveys store data to a memory in program order. The queue is further configured to identify an entry of the plurality of entries that corresponds to an oldest committed store instruction for a given thread and determine a next entry of the array that corresponds to a next committed store instruction in program order following the oldest committed store instruction of the given thread, wherein said next entry includes data identifying the entry. The queue marks an entry as unfilled upon successful conveying of store data to the memory.

Type: Application

Filed: May 21, 2009

Publication date: November 25, 2010

Inventor: Mark A. Luttrell
LOAD/STORE ORDERING IN A THREADED OUT-OF-ORDER PROCESSOR

Publication number: 20100293347

Abstract: Systems and methods for efficient load-store ordering. A processor comprises a store buffer that includes an array. The store buffer dynamically allocates any entry of the array for an out-of-order (o-o-o) issued store instruction independent of a corresponding thread. Circuitry within the store buffer determines a first set of entries of the array entries that have store instructions older in program order than a particular load instruction, wherein the store instructions have a same thread identifier and address as the load instruction. From the first set, the logic locates a single final match entry of the first set corresponding to the youngest store instruction of the first set, which may be used for read-after-write (RAW) hazard detection.

Type: Application

Filed: May 15, 2009

Publication date: November 18, 2010

Inventor: Mark A. Luttrell
Data Prefetcher that Adjusts Prefetch Stream Length Based on Confidence

Publication number: 20100268893

Abstract: In an embodiment, a processor comprises a data cache and a prefetch unit coupled to the data cache. The prefetch unit is configured to identify a prefetch stream in cache misses from the data cache, and the prefetch unit is configured to issue prefetches predicted by the prefetch stream to prefetch data into the data cache. More particularly, the prefetch unit implements one or more stream engines that generate prefetches for respective prefetch streams. Each stream engine is configured to maintain limit data that indicates a number of prefetches that are permitted to be outstanding beyond a most recent demand access. The stream engine is configured to increase the limit responsive to the number of demand accesses that consume prefetched data at least equaling the limit.

Type: Application

Filed: April 20, 2009

Publication date: October 21, 2010

Inventor: Mark A. Luttrell

prev 1 2 3 4 next