Patents by Inventor Gene W. Shen

Gene W. Shen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7861066
    Abstract: A mechanism for suppressing instruction replay includes a processor having one or more execution units and a scheduler that issue instruction operations for execution by the one or more execution units. The scheduler may also cause instruction operations that are determined to be incorrectly executed to be replayed, or reissued. In addition, a prediction unit within the processor may predict whether a given instruction operation will replay and to provide an indication that the given instruction operation will replay. The processor also includes a decode unit that may decode instructions and in response to detecting the indication, may flag the given instruction operation. The scheduler may further inhibit issue of the flagged instruction operation until a status associated with the flagged instruction is good.
    Type: Grant
    Filed: July 20, 2007
    Date of Patent: December 28, 2010
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Ashutosh S. Dhodapkar, Michael G. Butler, Gene W. Shen
  • Patent number: 7818543
    Abstract: A mechanism for superscalar decode of variable length instructions. A length decode unit may obtain a plurality of instruction bytes based on a scan window of a predetermined size. The instruction bytes may be associated with a plurality of variable length instructions, which are scheduled to be executed by a processing unit. The length decode unit may, for each instruction byte, estimate the start of a next variable length instruction following a current variable length instruction, and store a first pointer. A pre-pick unit may, for each instruction byte, use the first pointer to estimate the start of a subsequent variable length instruction following the next variable length instruction within the scan window, and store a second pointer. A pick unit may use a start pointer and related first and second pointers to determine the actual start of the variable length instructions within the scan window, and generate instruction pointers.
    Type: Grant
    Filed: July 10, 2007
    Date of Patent: October 19, 2010
    Assignee: GlobalFoundries Inc.
    Inventors: Gene W. Shen, Sean Lie
  • Patent number: 7818542
    Abstract: A mechanism for superscalar decode of variable length instructions. The decode mechanism may be included within a processing unit, and may comprise a length decode unit. The length decode unit may obtain a plurality of instruction bytes. The instruction bytes may be associated with a plurality of variable length instructions, which are to be executed by the processing unit. The length decode unit may perform a length decode operation for each of the plurality of instruction bytes. For each instruction byte, the length decode unit may estimate the instruction length of a current variable length instruction associated with a current instruction byte. Furthermore, during the length decode operation, for each instruction byte, the length decode unit may estimate the start of a next variable length instruction based on the estimated instruction length of the current variable length instruction, and store a first pointer to the estimated start of the next variable length instruction.
    Type: Grant
    Filed: July 10, 2007
    Date of Patent: October 19, 2010
    Assignee: GLOBALFOUNDRIES Inc.
    Inventors: Gene W. Shen, Sean Lie
  • Patent number: 7743232
    Abstract: A multiple-core processor having a hierarchical microcode store. A processor may include multiple processor cores, each configured to independently execute instructions defined according to a programmer-visible instruction set architecture (ISA). Each core may include a respective local microcode unit configured to store microcode entries. The processor may also include a remote microcode unit accessible by each of the processor cores. Any given one of the processor cores may be configured to generate a given microcode entrypoint corresponding to a particular microcode entry including one or more operations to be executed by the given processor core, and to determine whether the particular microcode entry is stored within the respective local microcode unit of the given core. In response to determining that the particular microcode entry is not stored within the respective local microcode unit, the given core may convey a request for the particular microcode entry to the remote microcode unit.
    Type: Grant
    Filed: July 18, 2007
    Date of Patent: June 22, 2010
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Gene W. Shen, Bruce R. Holloway, Sean Lie, Michael G. Butler
  • Patent number: 7725690
    Abstract: In one embodiment, a processor comprises an instruction buffer and a pick unit. The instruction buffer is coupled to receive instructions fetched from an instruction cache. The pick unit is configured to select up to N instructions from the instruction buffer for concurrent transmission to respective slots of a plurality of slots, where N is an integer greater than one. Additionally, the pick unit is configured to transmit an oldest instruction of the selected instructions to any of the plurality of slots even if a number of the selected instructions is greater than one. The pick unit is configured to concurrently transmit other ones of the selected instructions to other slots of the plurality of slots based on the slot to which the oldest instruction is transmitted. Some embodiments comprise a computer system including the processor and a communication device configured to communicate with another computer system.
    Type: Grant
    Filed: February 13, 2007
    Date of Patent: May 25, 2010
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Gene W. Shen, Sean Lie
  • Patent number: 7685410
    Abstract: In one embodiment, a processor comprises a branch resolution unit and a redirect recovery cache. The branch resolution unit is configured to detect a mispredicted branch operation, and to transmit a redirect address for fetching instructions from a correct target of the branch operation responsive to detecting the mispredicted branch operation. The redirect recovery cache comprises a plurality of cache entries, each cache entry configured to store operations corresponding to instructions fetched in response to respective mispredicted branch operations. The redirect recovery cache is coupled to receive the redirect address and, if the redirect address is a hit in the redirect recovery cache, the redirect recovery cache is configured to supply operations from the hit cache entry to a pipeline of the processor, bypassing at least one initial pipeline stage.
    Type: Grant
    Filed: February 13, 2007
    Date of Patent: March 23, 2010
    Assignee: Global Foundries Inc.
    Inventors: Gene W. Shen, Sean Lie
  • Publication number: 20090024838
    Abstract: A mechanism for suppressing instruction replay includes a processor having one or more execution units and a scheduler that issue instruction operations for execution by the one or more execution units. The scheduler may also cause instruction operations that are determined to be incorrectly executed to be replayed, or reissued. In addition, a prediction unit within the processor may predict whether a given instruction operation will replay and to provide an indication that the given instruction operation will replay. The processor also includes a decode unit that may decode instructions and in response to detecting the indication, may flag the given instruction operation. The scheduler may further inhibit issue of the flagged instruction operation until a status associated with the flagged instruction is good.
    Type: Application
    Filed: July 20, 2007
    Publication date: January 22, 2009
    Inventors: Ashutosh S. Dhodapkar, Michael G. Butler, Gene W. Shen
  • Publication number: 20090024836
    Abstract: A multiple-core processor having a hierarchical microcode store. A processor may include multiple processor cores, each configured to independently execute instructions defined according to a programmer-visible instruction set architecture (ISA). Each core may include a respective local microcode unit configured to store microcode entries. The processor may also include a remote microcode unit accessible by each of the processor cores. Any given one of the processor cores may be configured to generate a given microcode entrypoint corresponding to a particular microcode entry including one or more operations to be executed by the given processor core, and to determine whether the particular microcode entry is stored within the respective local microcode unit of the given core. In response to determining that the particular microcode entry is not stored within the respective local microcode unit, the given core may convey a request for the particular microcode entry to the remote microcode unit.
    Type: Application
    Filed: July 18, 2007
    Publication date: January 22, 2009
    Inventors: Gene W. Shen, Bruce R. Holloway, Sean Lie, Michael G. Butler
  • Publication number: 20090019257
    Abstract: A mechanism for superscalar decode of variable length instructions. A length decode unit may obtain a plurality of instruction bytes based on a scan window of a predetermined size. The instruction bytes may be associated with a plurality of variable length instructions, which are scheduled to be executed by a processing unit. The length decode unit may, for each instruction byte, estimate the start of a next variable length instruction following a current variable length instruction, and store a first pointer. A pre-pick unit may, for each instruction byte, use the first pointer to estimate the start of a subsequent variable length instruction following the next variable length instruction within the scan window, and store a second pointer. A pick unit may use a start pointer and related first and second pointers to determine the actual start of the variable length instructions within the scan window, and generate instruction pointers.
    Type: Application
    Filed: July 10, 2007
    Publication date: January 15, 2009
    Inventors: Gene W. Shen, Sean Lie
  • Publication number: 20090019263
    Abstract: A mechanism for superscalar decode of variable length instructions. The decode mechanism may be included within a processing unit, and may comprise a length decode unit. The length decode unit may obtain a plurality of instruction bytes. The instruction bytes may be associated with a plurality of variable length instructions, which are to be executed by the processing unit. The length decode unit may perform a length decode operation for each of the plurality of instruction bytes. For each instruction byte, the length decode unit may estimate the instruction length of a current variable length instruction associated with a current instruction byte. Furthermore, during the length decode operation, for each instruction byte, the length decode unit may estimate the start of a next variable length instruction based on the estimated instruction length of the current variable length instruction, and store a first pointer to the estimated start of the next variable length instruction.
    Type: Application
    Filed: July 10, 2007
    Publication date: January 15, 2009
    Inventors: Gene W. Shen, Sean Lie
  • Publication number: 20080195844
    Abstract: In one embodiment, a processor comprises a branch resolution unit and a redirect recovery cache. The branch resolution unit is configured to detect a mispredicted branch operation, and to transmit a redirect address for fetching instructions from a correct target of the branch operation responsive to detecting the mispredicted branch operation. The redirect recovery cache comprises a plurality of cache entries, each cache entry configured to store operations corresponding to instructions fetched in response to respective mispredicted branch operations. The redirect recovery cache is coupled to receive the redirect address and, if the redirect address is a hit in the redirect recovery cache, the redirect recovery cache is configured to supply operations from the hit cache entry to a pipeline of the processor, bypassing at least one initial pipeline stage.
    Type: Application
    Filed: February 13, 2007
    Publication date: August 14, 2008
    Inventors: Gene W. Shen, Sean Lie
  • Publication number: 20080195846
    Abstract: In one embodiment, a processor comprises an instruction buffer and a pick unit. The instruction buffer is coupled to receive instructions fetched from an instruction cache. The pick unit is configured to select up to N instructions from the instruction buffer for concurrent transmission to respective slots of a plurality of slots, where N is an integer greater than one. Additionally, the pick unit is configured to transmit an oldest instruction of the selected instructions to any of the plurality of slots even if a number of the selected instructions is greater than one. The pick unit is configured to concurrently transmit other ones of the selected instructions to other slots of the plurality of slots based on the slot to which the oldest instruction is transmitted. Some embodiments comprise a computer system including the processor and a communication device configured to communicate with another computer system.
    Type: Application
    Filed: February 13, 2007
    Publication date: August 14, 2008
    Inventors: Gene W. Shen, Sean Lie
  • Patent number: 7117290
    Abstract: A processor comprises a cache, a first TLB, and a tag circuit. The cache comprises a data memory storing a plurality of cache lines and a tag memory storing a plurality of tags. Each of the tags corresponds to a respective one of the cache lines. The first TLB stores a plurality of page portions of virtual addresses identifying a plurality of virtual pages for which physical address translations are stored in the first TLB. The tag circuit is configured to identify one or more of the plurality of cache lines that are stored in the cache and are within the plurality of virtual pages. In response to a hit by a first virtual address in the first TLB and a hit by the first virtual address in the tag circuit, the tag circuit is configured to prevent a read of the tag memory in the cache.
    Type: Grant
    Filed: September 3, 2003
    Date of Patent: October 3, 2006
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Gene W. Shen, S. Craig Nelson
  • Patent number: 7051300
    Abstract: A method is provided for architectural integrated circuit power estimation. The method may include receiving a plurality of respective energy events, receiving a plurality of base-level energy models, and generating a plurality of power models. Each power model may hierarchically instantiate one or more of the base-level energy models. The method may further include mapping each respective energy event to one or more of the plurality of power models. The method may further include hierarchically evaluating a particular base-level energy model corresponding to a given respective energy event, estimating an energy associated with evaluation of the particular base-level energy model, and accumulating the energy in a power estimate corresponding to the given respective energy event.
    Type: Grant
    Filed: September 4, 2003
    Date of Patent: May 23, 2006
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Gene W. Shen, Stephan G. Meier, Leslie A. Barnes, Paul S. Keltcher
  • Patent number: 5966530
    Abstract: A high-performance processor is disclosed with structure and methods for: (1) aggressively scheduling long latency instructions including load/store instructions while maintaining precise state; (2) maintaining and restoring state at any instruction boundary; (3) tracking instruction status; (4) checkpointing instructions; (5) creating, maintaining, and using a time-out checkpoint; (6) tracking floating-point exceptions; (7) creating, maintaining, and using a watchpoint for plural, simultaneous, unresolved-branch evaluation; and (9) increasing processor throughput while maintaining precise state. In one embodiment of the invention, a method of restoring machine state in a processor at any instruction boundary is disclosed. For any instruction which may modify control registers, the processor is either synchronized prior to execution or an instruction checkpoint is stored to preserve state; and for any instruction that creates a program counter discontinuity an instruction checkpoint is stored.
    Type: Grant
    Filed: June 11, 1997
    Date of Patent: October 12, 1999
    Assignee: Fujitsu, Ltd.
    Inventors: Gene W. Shen, John Szeto, Niteen A. Patkar, Michael C. Shebanow
  • Patent number: 5896526
    Abstract: A system and method providing a programmable hardware device within a CPU. The programmable hardware device permits a plurality of instructions to be trapped before they are executed. The instructions that are to be trapped are programmable to provide flexibility during CPU debugging and to ensure that a variety of application programs can be properly executed by the CPU. The system must also provide a means for permitting a trapped instruction to be emulated and/or to be executed serially.
    Type: Grant
    Filed: February 18, 1998
    Date of Patent: April 20, 1999
    Assignee: Fujitsu, Ltd.
    Inventors: Sunil Savkar, Gene W. Shen, Farnad Sajjadian, Michael C. Shebanow
  • Patent number: 5838940
    Abstract: In a microprocessor, apparatus and method coordinate the fetch and issue of instructions by rotating multiple, fetched instructions into an issue order prior to issuance and dispatching selected of the issue ordered instructions. The rotate and dispatch block including a mixer for mixing newly fetched instructions with previously fetched and unissued instructions in physical memory order, a mix and rotate device for rotating the mixed instructions into issue order, an instruction latch for holding the issue ordered instructions prior to dispatch, and an unrotate device for rotating non-issued instructions from issue order to physical memory order prior to mixing with newly fetched instructions.During the fetch cycle, multiple instructions are simultaneously fetched from storage in physical memory order and rotated into a PC-related issue order within the rotate and dispatch block.
    Type: Grant
    Filed: September 8, 1997
    Date of Patent: November 17, 1998
    Assignee: Fujitsu Limited
    Inventors: Sunil Savkar, Michael C. Shebanow, Gene W. Shen, Farnad Sajjadian
  • Patent number: 5751985
    Abstract: Apparatus and method provide for tracking and maintaining precise state by assigning a unique identification tag to each instruction at the time of issue, associating the tag with a storage location in a first active instruction data structure, updating the data stored in the storage location in response to instruction activity status changes for each instruction, and maintaining a plurality of pointers to the storage locations that move in response to the instruction activity status. Status information includes an activity data item, such as an activity bit, that is set at the time the instruction is issued and cleared when execution completes without error. Pointers are established that point to the last issued instruction, the last committed instruction pointer, and reclaimed instruction pointer.
    Type: Grant
    Filed: June 7, 1995
    Date of Patent: May 12, 1998
    Assignee: Hal Computer Systems, Inc.
    Inventors: Gene W. Shen, John Szeto, Niteen A. Patkar, Michael C. Shebanow
  • Patent number: 5675759
    Abstract: In a microprocessor, an apparatus is included for coordinating the use of physical registers in the microprocessor. Upon receiving an instruction, the coordination apparatus extracts source and destination logical registers from the instruction. For the destination logical register, the apparatus assigns a physical address to correspond to the logical register. In so doing, the apparatus stores the former relationship between the logical register and another physical register. Storing this former relationship allows the apparatus to backstep to a particular instruction when an execution exception is encountered. Also, the apparatus checks the instruction to determine whether it is a speculative branch instruction. If so, then the apparatus creates a checkpoint by storing selected state information. This checkpoint provides a reference point to which the processor may later backup if it is determined that a speculated branch was incorrectly predicted.
    Type: Grant
    Filed: September 1, 1995
    Date of Patent: October 7, 1997
    Inventors: Michael C. Shebanow, Gene W. Shen, Ravi Swami, Niteen A. Patkar
  • Patent number: 5673426
    Abstract: An out of program control order execution data processor that comprises an issue unit, execution means, a floating point exception unit a precise state unit, a floating point status register, and writing means. The issue unit issues instructions in program control order for execution. The issued instructions include floating point instructions and non-floating point instructions. The execution means executes the issued instructions such that at least the floating point instructions may be executed out of program control order by the execution means. The floating point exception unit includes a data storage structure including storage elements. Each issued instruction corresponds to one of the storage elements. Each storage element has a floating point instruction identifying field and a floating point trap type field.
    Type: Grant
    Filed: June 7, 1995
    Date of Patent: September 30, 1997
    Assignee: HaL Computer Systems, Inc.
    Inventors: Gene W. Shen, John Szeto, Michael C. Shebanow