Patents by Inventor Michael A. Fetterman

Michael A. Fetterman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DYNAMIC BANK MODE ADDRESSING FOR MEMORY ACCESS

Publication number: 20130268715

Abstract: One embodiment sets forth a technique for dynamically mapping addresses to banks of a multi-bank memory based on a bank mode. Application programs may be configured to perform read and write a memory accessing different numbers of bits per bank, e.g., 32-bits per bank, 64-bits per bank, or 128-bits per bank. On each clock cycle an access request may be received from one of the application programs and per processing thread addresses of the access request are dynamically mapped based on the bank mode to produce a set of bank addresses. The bank addresses are then used to access the multi-bank memory. Allowing different bank mappings enables each application program to avoid bank conflicts when the memory is accesses compared with using a single bank mapping for all accesses.

Type: Application

Filed: April 5, 2012

Publication date: October 10, 2013

Inventors: Michael FETTERMAN, Stewart Glenn Carlton, Douglas J. Hahn, Rajeshwaran Selvanesan, Shirish Gadre, Steven James Heinrich
UNIFORM LOAD PROCESSING FOR PARALLEL THREAD SUB-SETS

Publication number: 20130232322

Abstract: One embodiment of the present invention sets forth a technique for processing load instructions for parallel threads of a thread group when a sub-set of the parallel threads request the same memory address. The load/store unit determines if the memory addresses for each sub-set of parallel threads match based on one or more uniform patterns. When a match is achieved for at least one of the uniform patterns, the load/store unit transmits a read request to retrieve data for the sub-set of parallel threads. The number of read requests transmitted is reduced compared with performing a separate read request for each thread in the sub-set. A variety of uniform patterns may be defined based on common access patterns present in program instructions. A variety of uniform patterns may also be defined based on interconnect constraints between the load/store unit and the memory when a full crossbar interconnect is not available.

Type: Application

Filed: March 5, 2012

Publication date: September 5, 2013

Inventors: Michael FETTERMAN, Stewart Glenn Carlton, Douglas J. Hahn, Rajeshwaran Selvanesan, Shirish Gadre, Steven James Heinrich
PRE-SCHEDULED REPLAYS OF DIVERGENT OPERATIONS

Publication number: 20130212364

Abstract: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced.

Type: Application

Filed: February 9, 2012

Publication date: August 15, 2013

Inventors: Michael FETTERMAN, Stewart Glenn Carlton, Jack Hilaire Choquette, Shirish Gadre, Olivier Giroux, Douglas J. Hahn, Steven James Heinrich, Eric Lyell Hill, Charles McCarver, Omkar Paranjape, Anjana Rajendran, Rajeshwaran Selvanesan
Vector completion mask handling

Patent number: 8510536

Abstract: Techniques for vector completion mask (VCM) handling are provided. A data structure includes a mask field for each operand of a particular operation. A processor attempts to execute the operation with multiple operands, which are identified in the data structure by the mask fields. If operands are successfully retrieved for execution with the operation, then the corresponding mask field within the data structure is cleared. The processor can reset if any field remains set within the data structure and can re-process the operation with operands that were not previously handled with the operation.

Type: Grant

Filed: June 28, 2012

Date of Patent: August 13, 2013

Assignee: Intel Corporation

Inventors: Stephan Jourdan, Michael Fetterman, Michael Cornaby, Per Hammarlund, Ronak Signhal, Glenn Hinton
SHAPED REGISTER FILE READS

Publication number: 20130166877

Abstract: One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers.

Type: Application

Filed: December 22, 2011

Publication date: June 27, 2013

Inventors: Jack Hilaire CHOQUETTE, Michael FETTERMAN, Shirish GADRE, Xiaogang QIU, Omkar PARANJAPE, Anjana RAJENDRAN, Stewart Glenn CARLTON, Eric Lyell HILL, Rajeshwaran SELVANESAN, Douglas J. HAHN
METHODS AND APPARATUS FOR SCHEDULING INSTRUCTIONS WITHOUT INSTRUCTION DECODE

Publication number: 20130166882

Abstract: Systems and methods for scheduling instructions without instruction decode. In one embodiment, a multi-core processor includes a scheduling unit in each core for scheduling instructions from two or more threads scheduled for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The scheduling unit includes a macro-scheduler unit for performing a priority sort of the two or more threads and a micro-scheduler arbiter for determining the highest order thread that is ready to execute. The macro-scheduler unit and the micro-scheduler arbiter use pre-decode data to implement the scheduling algorithm. The pre-decode data may be generated by decoding only a small portion of the instruction or received along with the instruction. Once the micro-scheduler arbiter has selected an instruction to dispatch to the execution unit, a decode unit fully decodes the instruction.

Type: Application

Filed: December 22, 2011

Publication date: June 27, 2013

Inventors: Jack Hilaire CHOQUETTE, Robert J. STOLL, Olivier GIROUX, Michael FETTERMAN, Shirish GADRE, Robert Steven GLANVILLE, Alexandre JOLY
BATCHED REPLAYS OF DIVERGENT OPERATIONS

Publication number: 20130159684

Abstract: One embodiment of the present invention sets forth an optimized way to execute replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back. Advantageously, divergent operations requiring two or more replay operations operate with reduced latency. Where memory access operations require transfer of more than two cache lines to service all threads, the number of clock cycles required to complete all replay operations is reduced.

Type: Application

Filed: December 16, 2011

Publication date: June 20, 2013

Inventors: Michael Fetterman, Jack Hilaire Choquette, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Stewart glenn Carlton, Rajeshwaran Selvanesan, Douglas J. Hahn, Steven James Heinrich
Vector Completion Mask Handling

Publication number: 20120272046

Abstract: Techniques for vector completion mask (VCM) handling are provided. A data structure includes a mask field for each operand of a particular operation. A processor attempts to execute the operation with multiple operands, which are identified in the data structure by the mask fields. If operands are successfully retrieved for execution with the operation, then the corresponding mask field within the data structure is cleared. The processor can reset if any field remains set within the data structure and can re-process the operation with operands that were not previously handled with the operation.

Type: Application

Filed: June 28, 2012

Publication date: October 25, 2012

Inventors: Stephan Jourdan, Michael Fetterman, Michael Cornaby, Per Hammarlund, Ronak Signhal, Glenn Hinton
Vector completion mask handling

Patent number: 8239659

Abstract: Techniques for vector completion mask (VCM) handling are provided. A data structure includes a mask field for each operand of a particular operation. A processor attempts to execute the operation with multiple operands, which are identified in the data structure by the mask fields. If operands are successfully retrieved for execution with the operation, then the corresponding mask field within the data structure is cleared. The processor can reset if any field remains set within the data structure and can re-process the operation with operands that were not previously handled with the operation.

Type: Grant

Filed: September 29, 2006

Date of Patent: August 7, 2012

Assignee: Intel Corporation

Inventors: Stephan Jourdan, Michael Fetterman, Michael Cornaby, Per Hammarlund, Ronak Signhal, Glenn Hinton
Method and apparatus for combining micro-operations to process immediate data

Patent number: 7941651

Abstract: A method and apparatus for combining micro-operations to process immediate data. The immediate data may be wider than the immediate data storage capacity of a micro-operation. A first micro-operation is issued to process a first portion of the immediate data, which can be processed within the immediate data storage capacity of a micro-operation. A second micro-operation is issued to process a second portion of the immediate data, which can be processed within the immediate data storage capacity of a micro-operation. Execution of the first and second micro-operations and optionally of a third micro-operation serves to reconstruct the immediate data comprising the first portion and the second portion of the immediate data.

Type: Grant

Filed: June 27, 2002

Date of Patent: May 10, 2011

Assignee: Intel Corporation

Inventors: Bret L. Toll, John Alan Miller, Michael A. Fetterman
Providing a backing store in user-level memory

Patent number: 7500049

Abstract: In one embodiment, the present invention includes a method for requesting an allocation of memory to be a backing store for architectural state information of a processor and storing the architectural state information in the backing store using an application. In this manner, the backing store and processor enhancements using information in the backing store may be transparent to an operating system. Other embodiments are described and claimed.

Type: Grant

Filed: October 31, 2005

Date of Patent: March 3, 2009

Assignee: Intel Corporation

Inventors: Martin Dixon, Michael Cornaby, Michael Fetterman, Per Hammarlund
Load mechanism

Patent number: 7457932

Abstract: A method is disclosed. The method includes scheduling a load operation at least twice the size of a maximum access supported by a memory device, dividing the load operation into a plurality of separate load operation segments having a size equivalent to the maximum access supported by the memory device, and performing each of the plurality of load operation segments. A further method is disclosed where a temporary register is used to minimize the number of memory accesses to support unaligned accesses.

Type: Grant

Filed: December 30, 2005

Date of Patent: November 25, 2008

Assignee: Intel Corporation

Inventors: Per Hammarlund, Stephan Jourdan, Michael Fetterman, Glenn Hinton, Sebastien Hily, Ronak Singhal
Staggered execution stack for vector processing

Patent number: 7457938

Abstract: In one embodiment, the present invention includes a method for executing an operation on low order portions of first and second source operands using a first execution stack of a processor and executing the operation on high order portions of the first and second source operands using a second execution stack of the processor, where the operation in the second execution stack is staggered by one or more cycles from the operation in the first execution stack. Other embodiments are described and claimed.

Type: Grant

Filed: September 30, 2005

Date of Patent: November 25, 2008

Assignee: Intel Corporation

Inventors: Stephan Jourdan, Avinash Sodani, Michael Fetterman, Per Hammarlund, Ronak Singhal, Glenn Hinton
Flow optimization and prediction for VSSE memory operations

Patent number: 7404065

Abstract: In one embodiment, a method for flow optimization and prediction for vector streaming single instruction, multiple data (SIMD) extension (VSSE) memory operations is disclosed. The method comprises generating an optimized micro-operation (?op) flow for an instruction to operate on a vector if the instruction is predicted to be unmasked and unit-stride, the instruction to access elements in memory, and accessing via the optimized ?op flow two or more of the elements at the same time without determining masks of the two or more elements. Other embodiments are also described.

Type: Grant

Filed: December 21, 2005

Date of Patent: July 22, 2008

Assignee: Intel Corporation

Inventors: Stephan Jourdan, Per Hammarlund, Michael Fetterman, Michael P. Cornaby, Glenn Hinton, Avinash Sodani
Vector completion mask handling

Publication number: 20080082785

Abstract: Techniques for vector completion mask (VCM) handling are provided. A data structure includes a mask field for each operand of a particular operation. A processor attempts to execute the operation with multiple operands, which are identified in the data structure by the mask fields. If operands are successfully retrieved for execution with the operation, then the corresponding mask field within the data structure is cleared. The processor can reset if any field remains set within the data structure and can re-process the operation with operands that were not previously handled with the operation.

Type: Application

Filed: September 29, 2006

Publication date: April 3, 2008

Inventors: Stephan Jourdan, Michael Fetterman, Michael Cornaby, Per Hammarlund, Ronak Signhal, Glenn Hinton
Vector length tracking mechanism

Publication number: 20070283129

Abstract: According to one embodiment, a method is disclosed. The method includes receiving a value at a vector length (VL) tracker and establishing a VL for subsequent micro-operations (?ops) that are to be executed corresponding to the value.

Type: Application

Filed: December 28, 2005

Publication date: December 6, 2007

Inventors: Stephan Jourdan, Avinash Sodani, Michael Fetterman, Per Hammarlund, Glenn Hinton
Load mechanism

Publication number: 20070156990

Abstract: A method is disclosed. The method includes scheduling a load operation at least twice the size of a maximum access supported by a memory device, dividing the load operation into a plurality of separate load operation segments having a size equivalent to the maximum access supported by the memory device, and performing each of the plurality of load operation segments. A further method is disclosed where a temporary register is used to minimize the number of memory accesses to support unaligned accesses.

Type: Application

Filed: December 30, 2005

Publication date: July 5, 2007

Inventors: Per Hammarlund, Stephan Jourdan, Michael Fetterman, Glenn Hinton, Sebastien Hily, Ronak Singhal
Flow optimization and prediction for VSSE memory operations

Publication number: 20070143575

Abstract: In one embodiment, a method for flow optimization and prediction for vector streaming single instruction, multiple data (SIMD) extension (VSSE) memory operations is disclosed. The method comprises generating an optimized micro-operation (?op) flow for an instruction to operate on a vector if the instruction is predicted to be unmasked and unit-stride, the instruction to access elements in memory, and accessing via the optimized ?op flow two or more of the elements at the same time without determining masks of the two or more elements. Other embodiments are also described.

Type: Application

Filed: December 21, 2005

Publication date: June 21, 2007

Inventors: Stephen Jourdan, Per Hammarlund, Michael Fetterman, Michael Cornaby, Glenn Hinton, Avinash Sodani
Providing a backing store in user-level memory

Publication number: 20070101076

Abstract: In one embodiment, the present invention includes a method for requesting an allocation of memory to be a backing store for architectural state information of a processor and storing the architectural state information in the backing store using an application. In this manner, the backing store and processor enhancements using information in the backing store may be transparent to an operating system. Other embodiments are described and claimed.

Type: Application

Filed: October 31, 2005

Publication date: May 3, 2007

Inventors: Martin Dixon, Michael Cornaby, Michael Fetterman, Per Hammarlund
Staggered execution stack for vector processing

Publication number: 20070079179

Abstract: In one embodiment, the present invention includes a method for executing an operation on low order portions of first and second source operands using a first execution stack of a processor and executing the operation on high order portions of the first and second source operands using a second execution stack of the processor, where the operation in the second execution stack is staggered by one or more cycles from the operation in the first execution stack. Other embodiments are described and claimed.

Type: Application

Filed: September 30, 2005

Publication date: April 5, 2007

Inventors: Stephan Jourdan, Avinash Sodani, Michael Fetterman, Per Hammarlund, Ronak Singhal, Glenn Hinton

prev 1 2 3 4 next