Patents by Inventor Paul E. Kitchin

Paul E. Kitchin has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Memory load to load fusing

Patent number: 10956155

Abstract: A system and a method to cascade execution of instructions in a load-store unit (LSU) of a central processing unit (CPU) to reduce latency associated with the instructions. First data stored in a cache is read by the LSU in response a first memory load instruction of two immediately consecutive memory load instructions. Alignment, sign extension and/or endian operations are performed on the first data read from the cache in response to the first memory load instruction, and, in parallel, a memory-load address-forwarded result is selected based on a corrected alignment of the first data read in response to the first memory load instruction to provide a next address for a second of the two immediately consecutive memory load instructions. Second data stored in the cache is read by the LSU in response to the second memory load instruction based on the selected memory-load address-forwarded result.

Type: Grant

Filed: May 23, 2019

Date of Patent: March 23, 2021

Inventors: Paul E. Kitchin, Rama S. Gopal, Karthik Sundaram
Register renaming of a shareable instruction operand cache

Patent number: 10891135

Abstract: A system and a method are disclosed to process instructions in an execution unit (EU) that includes an operand cache (OC). The OC stores a copy of at least one frequently used operand stored in a physical register file (PRF). The EU may process instructions using operands obtained from the PRF or from the OC. In the first mode, an OC renaming unit (OC-REN) indicates to the EU to process instructions using operands obtained from the OC if processing the instructions using operands obtained from the OC uses less power than using operands obtained from the PRF. In the second mode, the OC-REN indicates to the EU to process the instructions using operands obtained from the PRF if processing the instructions using operands obtained from the PRF uses less power than using operands obtained from the OC.

Type: Grant

Filed: March 6, 2019

Date of Patent: January 12, 2021

Inventors: Paul E. Kitchin, Nicholas Humphries, Ken Yu Lim, Ryan Hensley
REGISTER RENAMING OF A SHAREABLE INSTRUCTION OPERAND CACHE

Publication number: 20200225954

Abstract: A system and a method are disclosed to process instructions in an execution unit (EU) that includes an operand cache (OC). The OC stores a copy of at least one frequently used operand stored in a physical register file (PRF). The EU may process instructions using operands obtained from the PRF or from the OC. In the first mode, an OC renaming unit (OC-REN) indicates to the EU to process instructions using operands obtained from the OC if processing the instructions using operands obtained from the OC uses less power than using operands obtained from the PRF. In the second mode, the OC-REN indicates to the EU to process the instructions using operands obtained from the PRF if processing the instructions using operands obtained from the PRF uses less power than using operands obtained from the OC.

Type: Application

Filed: March 6, 2019

Publication date: July 16, 2020

Inventors: Paul E. KITCHIN, Nicholas HUMPHRIES, Ken Yu LIM, Ryan HENSLEY
MEMORY LOAD TO LOAD FUSING

Publication number: 20190278603

Abstract: A system and a method to cascade execution of instructions in a load-store unit (LSU) of a central processing unit (CPU) to reduce latency associated with the instructions. First data stored in a cache is read by the LSU in response a first memory load instruction of two immediately consecutive memory load instructions. Alignment, sign extension and/or endian operations are performed on the first data read from the cache in response to the first memory load instruction, and, in parallel, a memory-load address-forwarded result is selected based on a corrected alignment of the first data read in response to the first memory load instruction to provide a next address for a second of the two immediately consecutive memory load instructions. Second data stored in the cache is read by the LSU in response to the second memory load instruction based on the selected memory-load address-forwarded result.

Type: Application

Filed: May 23, 2019

Publication date: September 12, 2019

Inventors: Paul E. KITCHIN, Rama S. GOPAL, Karthik SUNDARAM
Memory load to load fusing

Patent number: 10372452

Abstract: A system and a method to cascade execution of instructions in a load-store unit (LSU) of a central processing unit (CPU) to reduce latency associated with the instructions. First data stored in a cache is read by the LSU in response a first memory load instruction of two immediately consecutive memory load instructions. Alignment, sign extension and/or endian operations are performed on the first data read from the cache in response to the first memory load instruction, and, in parallel, a memory-load address-forwarded result is selected based on a corrected alignment of the first data read in response to the first memory load instruction to provide a next address for a second of the two immediately consecutive memory load instructions. Second data stored in the cache is read by the LSU in response to the second memory load instruction based on the selected memory-load address-forwarded result.

Type: Grant

Filed: June 6, 2017

Date of Patent: August 6, 2019

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Paul E. Kitchin, Rama S. Gopal, Karthik Sundaram
Instruction prefetcher dynamically controlled by readily available prefetcher accuracy

Patent number: 10296463

Abstract: According to one general aspect, an apparatus may include a branch prediction unit, a fetch unit, and a pre-fetch circuit or unit. The branch prediction unit may be configured to output a predicted instruction. The fetch unit may be configured to fetch a next instruction from a cache memory. The pre-fetcher circuit may be configured to pre-fetch a previously predicted instruction into the cache memory based upon a relationship between the predicted instruction and the next instruction.

Type: Grant

Filed: April 18, 2016

Date of Patent: May 21, 2019

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventor: Paul E. Kitchin
Memory load and arithmetic load unit (ALU) fusing

Patent number: 10275217

Abstract: According to one general aspect, a load unit may include a load circuit configured to load at least one piece of data from a memory. The load unit may include an alignment circuit configured to align the data to generate an aligned data. The load unit may also include a mathematical operation execution circuit configured to generate a resultant of a predetermined mathematical operation with the at least one piece of data as an operand. Wherein the load unit is configured to, if an active instruction is associated with the predetermined mathematical operation, bypass the alignment circuit and input the piece of data directly to the mathematical operation execution circuit.

Type: Grant

Filed: June 2, 2017

Date of Patent: April 30, 2019

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Rama S. Gopal, Paul E. Kitchin, Karthik Sundaram
MEMORY LOAD AND ARITHMETIC LOAD UNIT (ALU) FUSING

Publication number: 20180267775

Abstract: According to one general aspect, a load unit may include a load circuit configured to load at least one piece of data from a memory. The load unit may include an alignment circuit configured to align the data to generate an aligned data. The load unit may also include a mathematical operation execution circuit configured to generate a resultant of a predetermined mathematical operation with the at least one piece of data as an operand. Wherein the load unit is configured to, if an active instruction is associated with the predetermined mathematical operation, bypass the alignment circuit and input the piece of data directly to the mathematical operation execution circuit.

Type: Application

Filed: June 2, 2017

Publication date: September 20, 2018

Inventors: Rama S. GOPAL, Paul E. KITCHIN, Karthik SUNDARAM
MEMORY LOAD TO LOAD FUSING

Publication number: 20180267800

Abstract: A system and a method to cascade execution of instructions in a load-store unit (LSU) of a central processing unit (CPU) to reduce latency associated with the instructions. First data stored in a cache is read by the LSU in response a first memory load instruction of two immediately consecutive memory load instructions. Alignment, sign extension and/or endian operations are performed on the first data read from the cache in response to the first memory load instruction, and, in parallel, a memory-load address-forwarded result is selected based on a corrected alignment of the first data read in response to the first memory load instruction to provide a next address for a second of the two immediately consecutive memory load instructions. Second data stored in the cache is read by the LSU in response to the second memory load instruction based on the selected memory-load address-forwarded result.

Type: Application

Filed: June 6, 2017

Publication date: September 20, 2018

Inventors: Paul E. KITCHIN, Rama S. GOPAL, Karthik SUNDARAM
INSTRUCTION PREFETCHER DYNAMICALLY CONTROLLED BY READILY AVAILABLE PREFETCHER ACCURACY

Publication number: 20170199739

Abstract: According to one general aspect, an apparatus mat include a branch prediction unit, a fetch unit, and a pre-fetch circuit or unit. The branch prediction unit may be configured to output a predicted instruction. The fetch unit may be configured to fetch a next instruction from a cache memory. The pre-fetcher circuit may be configured to pre- fetch a previously predicted instruction into the cache memory based upon a relationship between the predicted instruction and the next instruction.

Type: Application

Filed: April 18, 2016

Publication date: July 13, 2017

Inventor: Paul E. KITCHIN
Methods and apparatus related to processor sleep states

Patent number: 9383801

Abstract: A system includes a processor including at least a first core and a local interrupt controller associated with the first core. The first core is operable to store its architectural state prior to entering a first core sleep state, and the processor is operable to receive and implement a request for entering a system sleep state in which the first core is in the first core sleep state and the local interrupt controller is powered down and exit the system sleep state by restoring the local interrupt controller and restoring the saved architectural state of the first core.

Type: Grant

Filed: December 21, 2012

Date of Patent: July 5, 2016

Assignee: Advanced Micro Devices, Inc.

Inventors: Alexander J. Branover, Andrew W. Lueck, Paul E. Kitchin, David A. Kaplan
Accelerated cache rinse when preparing a power state transition

Patent number: 9317100

Abstract: Methods, integrated circuit devices, and fabrication processes relating to power management transitions of a compute unit comprising a cache are presented. One method includes, responsive to an indication that the compute unit is attempting to enter a low power state, detecting at least one line of the cache differing from the corresponding line in memory, writing differing data from the at least one differing line to the memory, flushing at least one remaining differing line of the cache, and permitting the compute unit to enter the low power state, wherein the detecting and the writing are performed at a first frequency prior to the indication and at a second frequency subsequent the indication, and the second frequency is higher than the first frequency.

Type: Grant

Filed: January 10, 2013

Date of Patent: April 19, 2016

Assignee: Advanced Micro Devices, Inc.

Inventors: Paul E. Kitchin, William L. Walker
Method and apparatus for handling processor read-after-write hazards with cache misses

Patent number: 9274970

Abstract: According to one general aspect, an apparatus may include an instruction fetch unit, an execution unit, and a cache resynchronization predictor, as described above. The instruction fetch unit may be configured to issue a first memory read operation to a memory address, and a first memory write operation to the memory address, wherein the first memory read operation is stored at an instruction address. The execution unit may be configured to execute the first memory read operation, wherein the execution of the first memory read operation causes a resynchronization exception. The cache resynchronization predictor may be configured to associate the instruction address with a resynchronization exception, and determine if a memory read operation stored at the instruction address comprises a resynchronization predicted store.

Type: Grant

Filed: June 17, 2014

Date of Patent: March 1, 2016

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventor: Paul E. Kitchin
DECOUPLING L2 BTB FROM L2 CACHE TO ACCELERATE SEARCH FOR MISS AFTER MISS

Publication number: 20150268961

Abstract: According to one general aspect, a method may include requesting, from a second tier of a cache memory system, a first instruction stored at a first memory address. The method may also include requesting, from a second tier of a branch target buffer system, a branch record associated with the first memory address. The method may also include receiving the branch record before receiving the first instruction. The method may also include pre-fetching, in response to receiving the branch record and before receiving the first instruction, a non-sequential instruction stored at a non-sequential memory address.

Type: Application

Filed: August 19, 2014

Publication date: September 24, 2015

Inventors: Gerald D. ZURASKI, Vikas K. SINHA, David M. MIELKE, Paul E. KITCHIN
METHOD AND APPARATUS FOR HANDLING PROCESSOR READ-AFTER-WRITE HAZARDS WITH CACHE MISSES

Publication number: 20150186285

Abstract: According to one general aspect, an apparatus may include an instruction fetch unit, an execution unit, and a cache resynchronization predictor, as described above. The instruction fetch unit may be configured to issue a first memory read operation to a memory address, and a first memory write operation to the memory address, wherein the first memory read operation is stored at an instruction address. The execution unit may be configured to execute the first memory read operation, wherein the execution of the first memory read operation causes a resynchronization exception. The cache resynchronization predictor may be configured to associate the instruction address with a resynchronization exception, and determine if a memory read operation stored at the instruction address comprises a resynchronization predicted store.

Type: Application

Filed: June 17, 2014

Publication date: July 2, 2015

Inventor: Paul E. KITCHIN
ACCELERATED CACHE RINSE WHEN PREPARING A POWER STATE TRANSITION

Publication number: 20140195832

Abstract: Methods, integrated circuit devices, and fabrication processes relating to power management transitions of a compute unit comprising a cache are presented. One method includes, responsive to an indication that the compute unit is attempting to enter a low power state, detecting at least one line of the cache differing from the corresponding line in memory, writing differing data from the at least one differing line to the memory, flushing at least one remaining differing line of the cache, and permitting the compute unit to enter the low power state, wherein the detecting and the writing are performed at a first frequency prior to the indication and at a second frequency subsequent the indication, and the second frequency is higher than the first frequency.

Type: Application

Filed: January 10, 2013

Publication date: July 10, 2014

Applicant: ADVANCED MICRO DEVICES, INC.

Inventors: Paul E. Kitchin, William L. Walker
METHODS AND APPARATUS RELATED TO PROCESSOR SLEEP STATES

Publication number: 20140181557

Abstract: A system includes a processor including at least a first core and a local interrupt controller associated with the first core. The first core is operable to store its architectural state prior to entering a first core sleep state, and the processor is operable to receive and implement a request for entering a system sleep state in which the first core is in the first core sleep state and the local interrupt controller is powered down and exit the system sleep state by restoring the local interrupt controller and restoring the saved architectural state of the first core.

Type: Application

Filed: December 21, 2012

Publication date: June 26, 2014

Applicant: ADVANCED MICRO DEVICES, INC.

Inventors: Alexander J. Branover, Andrew W. Lueck, Paul E. Kitchin, David A. Kaplan