Patents by Inventor Kai Troester

Kai Troester has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11847463
    Abstract: A processor includes a load/store unit and an execution pipeline to execute an instruction that represents a single-instruction-multiple-data (SIMD) operation, and which references a memory block storing operand data for one or more lanes of a plurality of lanes and a mask vector indicating which lanes of a plurality of lanes are enabled and which are disabled for the operation. The execution pipeline executes an instruction in a first execution mode unless a memory fault is generated during execution of the instruction in the first execution mode. In response to the memory fault, the execution pipeline re-executes the instruction in a second execution mode. In the first execution mode, a single load operation is attempted to access the memory block via the load/store unit. In the second execution mode, a separate load operation is performed by the load/store unit for each enabled lane of the plurality of lanes prior to executing the SIMD operation.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: December 19, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Kai Troester, Scott Thomas Bingham, John M. King, Michael Estlick, Erik Swanson, Robert Weidner
  • Publication number: 20230315454
    Abstract: A method of fusing no-op (NOP) instructions includes receiving a no-op (NOP) instruction and generating, based on the NOP instruction and at least one other instruction, a fused NOP instruction.
    Type: Application
    Filed: March 30, 2022
    Publication date: October 5, 2023
    Inventor: KAI TROESTER
  • Patent number: 11467838
    Abstract: Systems, apparatuses, and methods for implementing a fastpath microcode sequencer are disclosed. A processor includes at least an instruction decode unit and first and second microcode units. For each received instruction, the instruction decode unit forwards the instruction to the first microcode unit if the instruction satisfies at least a first condition. In one implementation, the first condition is the instruction being classified as a frequently executed instruction. If a received instruction satisfies at least a second condition, the instruction decode unit forwards the received instruction to a second microcode unit. In one implementation, the first microcode unit is a smaller, faster structure than the second microcode unit. In one implementation, the second condition is the instruction being classified as an infrequently executed instruction.
    Type: Grant
    Filed: May 22, 2018
    Date of Patent: October 11, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Kai Troester, Magiting Talisayon, Hongwen Gao, Benjamin Floering, Emil Talpes
  • Patent number: 11294724
    Abstract: An approach is provided for allocating a shared resource to threads in a multi-threaded microprocessor based upon the usefulness of the shared resource to each of the threads. The usefulness of a shared resource to a thread is determined based upon the number of entries in the shared resource that are allocated to the thread and the number of active entries that the thread has in the shared resource. Threads that are allocated a large number of entries in the shared resource and have a small number of active entries in the shared resource, indicative of a low level of parallelism, can operate efficiently with fewer entries in the shared resource, and have their allocation limit in the shared resource reduced.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: April 5, 2022
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Kai Troester, Neil Marketkar, Matthew T. Sobel, Srinivas Keshav
  • Publication number: 20220058025
    Abstract: Systems, apparatuses, and methods for arbitrating threads in a computing system are disclosed. A computing system includes a processor with multiple cores, each capable of simultaneously processing instructions of multiple threads. When a thread throttling unit receives an indication that a shared cache has resource contention, the throttling unit sets a threshold number of cache misses for the cache. If the number of cache misses exceeds this threshold, then the throttling unit notifies a particular upstream computation unit to throttle the processing of instructions for the thread. After a time period elapses, if the cache continues to exceed the threshold, then the throttling unit notifies the upstream computation unit to more restrictively throttle the thread by performing one or more of reducing the selection rate and increasing the time period. Otherwise, the unit notifies the upstream computation unit to less restrictively throttle the thread.
    Type: Application
    Filed: November 5, 2021
    Publication date: February 24, 2022
    Inventors: Paul James Moyer, Douglas Benson Hunt, Kai Troester
  • Publication number: 20220027162
    Abstract: Systems, apparatuses, and methods for compressing multiple instruction operations together into a single retire queue entry are disclosed. A processor includes at least a scheduler, a retire queue, one or more execution units, and control logic. When the control logic detects a given instruction operation being dispatched by the scheduler to an execution unit, the control logic determines if the given instruction operation meets one or more conditions for being compressed with one or more other instruction operations into a single retire queue entry. If the one or more conditions are met, two or more instruction operations are stored together in a single retire queue entry. By compressing multiple instruction operations together into an individual retire queue entry, the retire queue is able to be used more efficiently, and the processor can speculatively execute more instructions without the retire queue exhausting its supply of available entries.
    Type: Application
    Filed: October 8, 2021
    Publication date: January 27, 2022
    Inventors: Matthew T. Sobel, Joshua James Lindner, Neil N. Marketkar, Kai Troester, Emil Talpes, Ashok Tirupathy Venkatachar
  • Patent number: 11169812
    Abstract: Systems, apparatuses, and methods for arbitrating threads in a computing system are disclosed. A computing system includes a processor with multiple cores, each capable of simultaneously processing instructions of multiple threads. When a thread throttling unit receives an indication that a shared cache has resource contention, the throttling unit sets a threshold number of cache misses for the cache. If the number of cache misses exceeds this threshold, then the throttling unit notifies a particular upstream computation unit to throttle the processing of instructions for the thread. After a time period elapses, if the cache continues to exceed the threshold, then the throttling unit notifies the upstream computation unit to more restrictively throttle the thread by performing one or more of reducing the selection rate and increasing the time period. Otherwise, the unit notifies the upstream computation unit to less restrictively throttle the thread.
    Type: Grant
    Filed: September 26, 2019
    Date of Patent: November 9, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Paul James Moyer, Douglas Benson Hunt, Kai Troester
  • Patent number: 11144324
    Abstract: Systems, apparatuses, and methods for compressing multiple instruction operations together into a single retire queue entry are disclosed. A processor includes at least a scheduler, a retire queue, one or more execution units, and control logic. When the control logic detects a given instruction operation being dispatched by the scheduler to an execution unit, the control logic determines if the given instruction operation meets one or more conditions for being compressed with one or more other instruction operations into a single retire queue entry. If the one or more conditions are met, two or more instruction operations are stored together in a single retire queue entry. By compressing multiple instruction operations together into an individual retire queue entry, the retire queue is able to be used more efficiently, and the processor can speculatively execute more instructions without the retire queue exhausting its supply of available entries.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: October 12, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Matthew T. Sobel, Joshua James Lindner, Neil N. Marketkar, Kai Troester, Emil Talpes, Ashok Tirupathy Venkatachar
  • Patent number: 11144353
    Abstract: Techniques for use in a microprocessor core for soft watermarking in thread shared resources implemented through thread mediation. A thread is removed from a thread mediation decision involving multiple threads competing or requesting to use a shared resource at a current clock cycle based on a number of entries in the shared resource that the thread is estimated to have allocated to it at the current clock cycle. By removing the thread from the thread mediation decision, the thread is stalled from allocating additional entries in the shared resource.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: October 12, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Kai Troester
  • Patent number: 11048506
    Abstract: A system and method for tracking stores and loads to reduce load latency when forming the same memory address by bypassing a load store unit within an execution unit is disclosed. Store-load pairs which have a strong history of store-to-load forwarding are identified. Once identified, the load is memory renamed to the register stored by the store. The memory dependency predictor may also be used to detect loads that are dependent on a store but cannot be renamed. In such a configuration, the dependence is signaled to the load store unit and the load store unit uses the information to issue the load after the identified store has its physical address.
    Type: Grant
    Filed: June 24, 2019
    Date of Patent: June 29, 2021
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Krishnan V. Ramani, Kai Troester, Frank C. Galloway, David N. Suggs, Michael D. Achenbach, Betty Ann McDaniel, Marius Evers
  • Patent number: 11023241
    Abstract: Systems and methods selectively bypass address-generation hardware in processor instruction pipelines. In an embodiment, a processor includes an address-generation stage and an address-generation-bypass-determination unit (ABDU). The ABDU receives a load/store instruction. If an effective address for the load/store instruction is not known at the ABDU, the ABDU routes the load/store instruction via the address-generation stage of the processor. If, however, the effective address of the load/store instruction is known at the ABDU, the ABDU routes the load/store instruction to bypass the address-generation stage of the processor.
    Type: Grant
    Filed: August 21, 2018
    Date of Patent: June 1, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Andrej Kocev, Jay Fleischman, Kai Troester, Johnny C. Chu, Tim J. Wilkens, Neil Marketkar, Michael W. Long
  • Publication number: 20210096874
    Abstract: Systems, apparatuses, and methods for compressing multiple instruction operations together into a single retire queue entry are disclosed. A processor includes at least a scheduler, a retire queue, one or more execution units, and control logic. When the control logic detects a given instruction operation being dispatched by the scheduler to an execution unit, the control logic determines if the given instruction operation meets one or more conditions for being compressed with one or more other instruction operations into a single retire queue entry. If the one or more conditions are met, two or more instruction operations are stored together in a single retire queue entry. By compressing multiple instruction operations together into an individual retire queue entry, the retire queue is able to be used more efficiently, and the processor can speculatively execute more instructions without the retire queue exhausting its supply of available entries.
    Type: Application
    Filed: September 27, 2019
    Publication date: April 1, 2021
    Inventors: Matthew T. Sobel, Joshua James Lindner, Neil N. Marketkar, Kai Troester, Emil Talpes, Ashok Tirupathy Venkatachar
  • Publication number: 20210096873
    Abstract: Systems, apparatuses, and methods for arbitrating threads in a computing system are disclosed. A computing system includes a processor with multiple cores, each capable of simultaneously processing instructions of multiple threads. When a thread throttling unit receives an indication that a shared cache has resource contention, the throttling unit sets a threshold number of cache misses for the cache. If the number of cache misses exceeds this threshold, then the throttling unit notifies a particular upstream computation unit to throttle the processing of instructions for the thread. After a time period elapses, if the cache continues to exceed the threshold, then the throttling unit notifies the upstream computation unit to more restrictively throttle the thread by performing one or more of reducing the selection rate and increasing the time period. Otherwise, the unit notifies the upstream computation unit to less restrictively throttle the thread.
    Type: Application
    Filed: September 26, 2019
    Publication date: April 1, 2021
    Inventors: Paul James Moyer, Douglas Benson Hunt, Kai Troester
  • Publication number: 20210096920
    Abstract: An approach is provided for allocating a shared resource to threads in a multi-threaded microprocessor based upon the usefulness of the shared resource to each of the threads. The usefulness of a shared resource to a thread is determined based upon the number of entries in the shared resource that are allocated to the thread and the number of active entries that the thread has in the shared resource. Threads that are allocated a large number of entries in the shared resource and have a small number of active entries in the shared resource, indicative of a low level of parallelism, can operate efficiently with fewer entries in the shared resource, and have their allocation limit in the shared resource reduced.
    Type: Application
    Filed: September 27, 2019
    Publication date: April 1, 2021
    Inventors: Kai Troester, Neil Marketkar, Matthew T. Sobel, Srinivas Keshav
  • Publication number: 20210096914
    Abstract: Techniques for use in a microprocessor core for soft watermarking in thread shared resources implemented through thread mediation. A thread is removed from a thread mediation decision involving multiple threads competing or requesting to use a shared resource at a current clock cycle based on a number of entries in the shared resource that the thread is estimated to have allocated to it at the current clock cycle. By removing the thread from the thread mediation decision, the thread is stalled from allocating additional entries in the shared resource.
    Type: Application
    Filed: September 27, 2019
    Publication date: April 1, 2021
    Inventor: Kai Troester
  • Publication number: 20210096857
    Abstract: A processor includes a load/store unit and an execution pipeline to execute an instruction that represents a single-instruction-multiple-data (SIMD) operation, and which references a memory block storing operand data for one or more lanes of a plurality of lanes and a mask vector indicating which lanes of a plurality of lanes are enabled and which are disabled for the operation. The execution pipeline executes an instruction in a first execution mode unless a memory fault is generated during execution of the instruction in the first execution mode. In response to the memory fault, the execution pipeline re-executes the instruction in a second execution mode. In the first execution mode, a single load operation is attempted to access the memory block via the load/store unit. In the second execution mode, a separate load operation is performed by the load/store unit for each enabled lane of the plurality of lanes prior to executing the SIMD operation.
    Type: Application
    Filed: September 27, 2019
    Publication date: April 1, 2021
    Inventors: Kai TROESTER, Scott Thomas BINGHAM, John M. KING, Michael ESTLICK, Erik SWANSON, Robert WEIDNER
  • Publication number: 20200065108
    Abstract: Systems and methods selectively bypass address-generation hardware in processor instruction pipelines. In an embodiment, a processor includes an address-generation stage and an address-generation-bypass-determination unit (ABDU). The ABDU receives a load/store instruction. If an effective address for the load/store instruction is not known at the ABDU, the ABDU routes the load/store instruction via the address-generation stage of the processor. If, however, the effective address of the load/store instruction is known at the ABDU, the ABDU routes the load/store instruction to bypass the address-generation stage of the processor.
    Type: Application
    Filed: August 21, 2018
    Publication date: February 27, 2020
    Inventors: ANDREJ KOCEV, JAY FLEISCHMAN, KAI TROESTER, JOHNNY C. CHU, TIM J. WILKENS, NEIL MARKETKAR, MICHAEL W. LONG
  • Publication number: 20190361699
    Abstract: Systems, apparatuses, and methods for implementing a fastpath microcode sequencer are disclosed. A processor includes at least an instruction decode unit and first and second microcode units. For each received instruction, the instruction decode unit forwards the instruction to the first microcode unit if the instruction satisfies at least a first condition. In one implementation, the first condition is the instruction being classified as a frequently executed instruction. If a received instruction satisfies at least a second condition, the instruction decode unit forwards the received instruction to a second microcode unit. In one implementation, the first microcode unit is a smaller, faster structure than the second microcode unit. In one implementation, the second condition is the instruction being classified as an infrequently executed instruction.
    Type: Application
    Filed: May 22, 2018
    Publication date: November 28, 2019
    Inventors: Kai Troester, Magiting Talisayon, Hongwen Gao, Benjamin Floering, Emil Talpes
  • Publication number: 20190310845
    Abstract: A system and method for tracking stores and loads to reduce load latency when forming the same memory address by bypassing a load store unit within an execution unit is disclosed. Store-load pairs which have a strong history of store-to-load forwarding are identified. Once identified, the load is memory renamed to the register stored by the store. The memory dependency predictor may also be used to detect loads that are dependent on a store but cannot be renamed. In such a configuration, the dependence is signaled to the load store unit and the load store unit uses the information to issue the load after the identified store has its physical address.
    Type: Application
    Filed: June 24, 2019
    Publication date: October 10, 2019
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Krishnan V. Ramani, Kai Troester, Frank C. Galloway, David N. Suggs, Michael D. Achenbach, Betty Ann McDaniel, Marius Evers
  • Patent number: 10331357
    Abstract: A system and method for tracking stores and loads to reduce load latency when forming the same memory address by bypassing a load store unit within an execution unit is disclosed. The system and method include storing data in one or more memory dependent architectural register numbers (MdArns), allocating the one or more MdArns to a MEMFILE, writing the allocated one or more MdArns to a map file, wherein the map file contains a MdArn map to enable subsequent access to an entry in the MEMFILE, upon receipt of a load request, checking a base, an index, a displacement and a match/hit via the map file to identify an entry in the MEMFILE and an associated store, and on a hit, providing the entry responsive to the load request from the one or more MdArns.
    Type: Grant
    Filed: December 15, 2016
    Date of Patent: June 25, 2019
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Betty Ann McDaniel, Michael D. Achenbach, David N. Suggs, Frank C. Galloway, Kai Troester, Krishnan V. Ramani