Patents by Inventor Kevin M. Lepak

Kevin M. Lepak has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

REDUCING CHIPLET WAKEUP LATENCY

Publication number: 20200387208

Abstract: Systems, apparatuses, and methods for reducing chiplet interrupt latency are disclosed. A system includes one or more processing nodes, one or more memory devices, a communication fabric coupled to the processing unit(s) and memory device(s) via link interfaces, and a power management unit. The power management unit manages the power states of the various components and the link interfaces of the system. If the power management unit detects a request to wake up a given component, and the link interface to the given component is powered down, then the power management unit sends an out-of-band signal to wake up the given component in parallel with powering up the link interface. Also, when multiple link interfaces need to be powered up, the power management unit powers up the multiple link interfaces in an order which complies with voltage regulator load-step requirements while minimizing the latency of pending operations.

Type: Application

Filed: May 18, 2020

Publication date: December 10, 2020

Inventors: Benjamin Tsien, Michael J. Tresidder, Ivan Yanfeng Wang, Kevin M. Lepak, Ann Ling, Richard M. Born, John P. Petry, Bryan P. Broussard, Eric Christopher Morton
PLATFORM POWER MANAGER FOR RACK LEVEL POWER AND THERMAL CONSTRAINTS

Publication number: 20200379544

Abstract: Platform power management includes boosting performance in a platform power boost mode or restricting performance to keep a power or temperature under a desired threshold in a platform power cap mode. Platform power management exploits the mutually exclusive nature of activities and the associated headroom created in a temperature and/or power budget of a server platform to boost performance of a particular component while also keeping temperature and/or power below a threshold or budget.

Type: Application

Filed: May 31, 2019

Publication date: December 3, 2020

Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Indrani Paul, Sriram Sambamurthy, Larry David Hewitt, Kevin M. Lepak, Samuel D. Naffziger, Adam Neil Calder Clark, Aaron Joseph Grenat, Steven Frederick Liepe, Sandhya Shyamasundar, Wonje Choi, Dana Glenn Lewis, Leonardo de Paula Rosa Piga
Home agent based cache transfer acceleration scheme

Patent number: 10776282

Abstract: Systems, apparatuses, and methods for implementing a speculative probe mechanism are disclosed. A system includes at least multiple processing nodes, a probe filter, and a coherent slave. The coherent slave includes an early probe cache to cache recent lookups to the probe filter. The early probe cache includes entries for regions of memory, wherein a region includes a plurality of cache lines. The coherent slave performs parallel lookups to the probe filter and the early probe cache responsive to receiving a memory request. An early probe is sent to a first processing node responsive to determining that a lookup to the early probe cache hits on a first entry identifying the first processing node as an owner of a first region targeted by the memory request and responsive to determining that a confidence indicator of the first entry is greater than a threshold.

Type: Grant

Filed: December 15, 2017

Date of Patent: September 15, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Amit P. Apte, Ganesh Balakrishnan, Vydhyanathan Kalyanasundharam, Kevin M. Lepak
LIGHT-WEIGHT MEMORY EXPANSION IN A COHERENT MEMORY SYSTEM

Publication number: 20200226081

Abstract: Systems, methods, and port controller designs employ a light-weight memory protocol. A light-weight memory protocol controller is selectively coupled to a Cache Coherent Interconnect for Accelerators (CCIX) port. Over an on-chip interconnect fabric, the light-weight protocol controller receives memory access requests from a processor and, in response, transmits associated memory access requests to an external memory through the CCIX port using only a proper subset of CCIX protocol memory transactions types including non-cacheable transactions and non-snooping transactions. The light-weight memory protocol controller is selectively uncoupled from the CCIX port and a remote coherent slave controller is coupled in its place. The remote coherent slave controller receives memory access requests and, in response, transmits associated memory access requests to a memory module through the CCIX port using cacheable CCIX protocol memory transaction types.

Type: Application

Filed: January 16, 2019

Publication date: July 16, 2020

Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Vydhyanathan Kalyanasundharam, Philip Ng, Alexander J. Branover, Kevin M. Lepak
Region based split-directory scheme to adapt to large cache sizes

Patent number: 10705959

Abstract: Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.

Type: Grant

Filed: August 31, 2018

Date of Patent: July 7, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan
Reducing chiplet wakeup latency

Patent number: 10656696

Abstract: Systems, apparatuses, and methods for reducing chiplet interrupt latency are disclosed. A system includes one or more processing nodes, one or more memory devices, a communication fabric coupled to the processing unit(s) and memory device(s) via link interfaces, and a power management unit. The power management unit manages the power states of the various components and the link interfaces of the system. If the power management unit detects a request to wake up a given component, and the link interface to the given component is powered down, then the power management unit sends an out-of-band signal to wake up the given component in parallel with powering up the link interface. Also, when multiple link interfaces need to be powered up, the power management unit powers up the multiple link interfaces in an order which complies with voltage regulator load-step requirements while minimizing the latency of pending operations.

Type: Grant

Filed: February 28, 2018

Date of Patent: May 19, 2020

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Benjamin Tsien, Michael J. Tresidder, Ivan Yanfeng Wang, Kevin M. Lepak, Ann Ling, Richard M. Born, John P. Petry, Bryan P. Broussard, Eric Christopher Morton
REGION BASED SPLIT-DIRECTORY SCHEME TO ADAPT TO LARGE CACHE SIZES

Publication number: 20200073801

Abstract: Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.

Type: Application

Filed: August 31, 2018

Publication date: March 5, 2020

Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan
Tag accelerator for low latency DRAM cache

Patent number: 10545875

Abstract: Systems, apparatuses, and methods for implementing a tag accelerator cache are disclosed. A system includes at least a data cache and a control unit coupled to the data cache via a memory controller. The control unit includes a tag accelerator cache (TAC) for caching tag blocks fetched from the data cache. The data cache is organized such that multiple tags are retrieved in a single access. This allows hiding the tag latency penalty for future accesses to neighboring tags and improves cache bandwidth. When a tag block is fetched from the data cache, the tag block is cached in the TAC. Memory requests received by the control unit first lookup the TAC before being forwarded to the data cache. Due to the presence of spatial locality in applications, the TAC can filter out a large percentage of tag accesses to the data cache, resulting in latency and bandwidth savings.

Type: Grant

Filed: December 27, 2017

Date of Patent: January 28, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Ganesh Balakrishnan, Ravindra N. Bhargava
Tag and data organization in large memory caches

Patent number: 10366008

Abstract: A data processing system includes a processor and a cache controller coupled to the processor, and adapted to be coupled to a memory. The cache controller uses the memory to form a pseudo direct mapped cache having a plurality of groups of pages. The memory forms a first number of selected pages, including a first page for storing a plurality of sets of tags and a plurality of remaining pages for storing data. Each tag, of the plurality of sets of tags, stores tags for respective entries in a corresponding one of the plurality of remaining pages.

Type: Grant

Filed: December 12, 2016

Date of Patent: July 30, 2019

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Ganesh Balakrishnan, Vydhyanathan Kalyanasundharam, Kevin M. Lepak
TAG ACCELERATOR FOR LOW LATENCY DRAM CACHE

Publication number: 20190196974

Abstract: Systems, apparatuses, and methods for implementing a tag accelerator cache are disclosed. A system includes at least a data cache and a control unit coupled to the data cache via a memory controller. The control unit includes a tag accelerator cache (TAC) for caching tag blocks fetched from the data cache. The data cache is organized such that multiple tags are retrieved in a single access. This allows hiding the tag latency penalty for future accesses to neighboring tags and improves cache bandwidth. When a tag block is fetched from the data cache, the tag block is cached in the TAC. Memory requests received by the control unit first lookup the TAC before being forwarded to the data cache. Due to the presence of spatial locality in applications, the TAC can filter out a large percentage of tag accesses to the data cache, resulting in latency and bandwidth savings.

Type: Application

Filed: December 27, 2017

Publication date: June 27, 2019

Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Ganesh Balakrishnan, Ravindra N. Bhargava
HOME AGENT BASED CACHE TRANSFER ACCELERATION SCHEME

Publication number: 20190188155

Abstract: Systems, apparatuses, and methods for implementing a speculative probe mechanism are disclosed. A system includes at least multiple processing nodes, a probe filter, and a coherent slave. The coherent slave includes an early probe cache to cache recent lookups to the probe filter. The early probe cache includes entries for regions of memory, wherein a region includes a plurality of cache lines. The coherent slave performs parallel lookups to the probe filter and the early probe cache responsive to receiving a memory request. An early probe is sent to a first processing node responsive to determining that a lookup to the early probe cache hits on a first entry identifying the first processing node as an owner of a first region targeted by the memory request and responsive to determining that a confidence indicator of the first entry is greater than a threshold.

Type: Application

Filed: December 15, 2017

Publication date: June 20, 2019

Inventors: Amit P. Apte, Ganesh Balakrishnan, Vydhyanathan Kalyanasundharam, Kevin M. Lepak
REGION BASED DIRECTORY SCHEME TO ADAPT TO LARGE CACHE SIZES

Publication number: 20190188137

Abstract: Systems, apparatuses, and methods for maintaining a region-based cache directory are disclosed. A system includes multiple processing nodes, with each processing node including a cache subsystem. The system also includes a cache directory to help manage cache coherency among the different cache subsystems of the system. In order to reduce the number of entries in the cache directory, the cache directory tracks coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Accordingly, the system includes a region-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the system. The cache directory includes a reference count in each entry to track the aggregate number of cache lines that are cached per region. If a reference count of a given entry goes to zero, the cache directory reclaims the given entry.

Type: Application

Filed: December 18, 2017

Publication date: June 20, 2019

Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan, Eric Christopher Morton, Elizabeth M. Cooper, Ravindra N. Bhargava
Opportunistic cache injection of data into lower latency levels of the cache hierarchy

Patent number: 10216640

Abstract: According to one general aspect, a method may include receiving a request, from a non-central processor device that is configured to perform a direct memory access, to write data within a memory system at a memory address. The method may also include determining if a cache tag hit is generated, based upon the memory address, by a caching tier of the memory system that is closer, latency-wise, to a central processor than a coherent memory interconnect. The method may further include if the caching tier generated the cache tag hit, injecting the data into the caching tier.

Type: Grant

Filed: March 9, 2015

Date of Patent: February 26, 2019

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Andrew J. Rushing, Kevin M. Lepak
Speculative retirement of post-lock instructions

Patent number: 10095637

Abstract: Techniques for improving execution of a lock instruction are provided herein. A lock instruction and younger instructions are allowed to speculatively retire prior to the store portion of the lock instruction committing its value to memory. These instructions thus do not have to wait for the lock instruction to complete before retiring. In the event that the processor detects a violation of the atomic or fencing properties of the lock instruction prior to committing the value of the lock instruction, the processor rolls back state and executes the lock instruction in a slow mode in which younger instructions are not allowed to retire until the stored value of the lock instruction is committed. Speculative retirement of these instructions results in increased processing speed, as instructions no longer need to wait to retire after execution of a lock instruction.

Type: Grant

Filed: September 15, 2016

Date of Patent: October 9, 2018

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Gregory W. Smaus, John M. King, Michael D. Achenbach, Kevin M. Lepak, Matthew A. Rafacz, Noah Bamford
TAG AND DATA ORGANIZATION IN LARGE MEMORY CACHES

Publication number: 20180165202

Abstract: A data processing system includes a processor and a cache controller coupled to the processor, and adapted to be coupled to a memory. The cache controller uses the memory to form a pseudo direct mapped cache having a plurality of groups of pages. The memory forms a first number of selected pages, including a first page for storing a plurality of sets of tags and a plurality of remaining pages for storing data. Each tag, of the plurality of sets of tags, stores tags for respective entries in a corresponding one of the plurality of remaining pages.

Type: Application

Filed: December 12, 2016

Publication date: June 14, 2018

Applicant: Advanced Micro Devices, Inc.

Inventors: Ganesh Balakrishnan, Vydhyanathan Kalyanasundharam, Kevin M. Lepak
SPECULATIVE RETIREMENT OF POST-LOCK INSTRUCTIONS

Publication number: 20180074977

Abstract: Techniques for improving execution of a lock instruction are provided herein. A lock instruction and younger instructions are allowed to speculatively retire prior to the store portion of the lock instruction committing its value to memory. These instructions thus do not have to wait for the lock instruction to complete before retiring. In the event that the processor detects a violation of the atomic or fencing properties of the lock instruction prior to committing the value of the lock instruction, the processor rolls back state and executes the lock instruction in a slow mode in which younger instructions are not allowed to retire until the stored value of the lock instruction is committed. Speculative retirement of these instructions results in increased processing speed, as instructions no longer need to wait to retire after execution of a lock instruction.

Type: Application

Filed: September 15, 2016

Publication date: March 15, 2018

Applicant: Advanced Micro Devices, Inc.

Inventors: Gregory W. Smaus, John M. King, Michael D. Achenbach, Kevin M. Lepak, Matthew A. Rafacz, Noah Bamford
Removal and optimization of coherence acknowledgement responses in an interconnect

Patent number: 9355034

Abstract: According to one general aspect, a method of performing a cache transaction may include transmitting a cache request to a target device. The method may include receiving a cache response that is associated with the cache request. The method may further include completing the cache transaction without transmitting an exclusive cache response acknowledgement message to the target device.

Type: Grant

Filed: May 9, 2014

Date of Patent: May 31, 2016

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Kevin M. Lepak, William A. Hughes
OPPORTUNISTIC CACHE INJECTION OF DATA INTO LOWER LATENCY LEVELS OF THE CACHE HIERARCHY

Publication number: 20150269084

Abstract: According to one general aspect, a method may include receiving a request, from a non-central processor device that is configured to perform a direct memory access, to write data within a memory system at a memory address. The method may also include determining if a cache tag hit is generated, based upon the memory address, by a caching tier of the memory system that is closer, latency-wise, to a central processor than a coherent memory interconnect. The method may further include if the caching tier generated the cache tag hit, injecting the data into the caching tier.

Type: Application

Filed: March 9, 2015

Publication date: September 24, 2015

Inventors: Andrew J. RUSHING, Kevin M. LEPAK
NETWORK INTERCONNECT WITH REDUCED CONGESTION

Publication number: 20150199286

Abstract: An embodiment includes a system, comprising: an interface; a buffer; and a controller configured to: receive a request through the interface; in a first mode, reserve memory in the buffer for a response to the request if the request is a first type and not reserve memory in the buffer for the response to the request if the request is a second type; and in a second mode, reserve memory in the buffer for the response to the request if the request is the first type or the second type.

Type: Application

Filed: May 21, 2014

Publication date: July 16, 2015

Applicant: Samsung Electronics Co., Ltd.

Inventors: William A. HUGHES, Kevin M. LEPAK
REMOVAL AND OPTIMIZATION OF COHERENCE ACKNOWLEDGEMENT RESPONSES IN AN INTERCONNECT

Publication number: 20150186276

Abstract: According to one general aspect, a method of performing a cache transaction may include transmitting a cache request to a target device. The method may include receiving a cache response that is associated with the cache request. The method may further include completing the cache transaction without transmitting an exclusive cache response acknowledgement message to the target device.

Type: Application

Filed: May 9, 2014

Publication date: July 2, 2015

Applicant: Samsung Electronics Co., Ltd.

Inventors: Kevin M. LEPAK, William A. HUGHES

prev 1 2 3 next