Patents by Inventor Kevin M. Lepak
Kevin M. Lepak has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20200387208Abstract: Systems, apparatuses, and methods for reducing chiplet interrupt latency are disclosed. A system includes one or more processing nodes, one or more memory devices, a communication fabric coupled to the processing unit(s) and memory device(s) via link interfaces, and a power management unit. The power management unit manages the power states of the various components and the link interfaces of the system. If the power management unit detects a request to wake up a given component, and the link interface to the given component is powered down, then the power management unit sends an out-of-band signal to wake up the given component in parallel with powering up the link interface. Also, when multiple link interfaces need to be powered up, the power management unit powers up the multiple link interfaces in an order which complies with voltage regulator load-step requirements while minimizing the latency of pending operations.Type: ApplicationFiled: May 18, 2020Publication date: December 10, 2020Inventors: Benjamin Tsien, Michael J. Tresidder, Ivan Yanfeng Wang, Kevin M. Lepak, Ann Ling, Richard M. Born, John P. Petry, Bryan P. Broussard, Eric Christopher Morton
-
Publication number: 20200379544Abstract: Platform power management includes boosting performance in a platform power boost mode or restricting performance to keep a power or temperature under a desired threshold in a platform power cap mode. Platform power management exploits the mutually exclusive nature of activities and the associated headroom created in a temperature and/or power budget of a server platform to boost performance of a particular component while also keeping temperature and/or power below a threshold or budget.Type: ApplicationFiled: May 31, 2019Publication date: December 3, 2020Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Indrani Paul, Sriram Sambamurthy, Larry David Hewitt, Kevin M. Lepak, Samuel D. Naffziger, Adam Neil Calder Clark, Aaron Joseph Grenat, Steven Frederick Liepe, Sandhya Shyamasundar, Wonje Choi, Dana Glenn Lewis, Leonardo de Paula Rosa Piga
-
Patent number: 10776282Abstract: Systems, apparatuses, and methods for implementing a speculative probe mechanism are disclosed. A system includes at least multiple processing nodes, a probe filter, and a coherent slave. The coherent slave includes an early probe cache to cache recent lookups to the probe filter. The early probe cache includes entries for regions of memory, wherein a region includes a plurality of cache lines. The coherent slave performs parallel lookups to the probe filter and the early probe cache responsive to receiving a memory request. An early probe is sent to a first processing node responsive to determining that a lookup to the early probe cache hits on a first entry identifying the first processing node as an owner of a first region targeted by the memory request and responsive to determining that a confidence indicator of the first entry is greater than a threshold.Type: GrantFiled: December 15, 2017Date of Patent: September 15, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Amit P. Apte, Ganesh Balakrishnan, Vydhyanathan Kalyanasundharam, Kevin M. Lepak
-
Publication number: 20200226081Abstract: Systems, methods, and port controller designs employ a light-weight memory protocol. A light-weight memory protocol controller is selectively coupled to a Cache Coherent Interconnect for Accelerators (CCIX) port. Over an on-chip interconnect fabric, the light-weight protocol controller receives memory access requests from a processor and, in response, transmits associated memory access requests to an external memory through the CCIX port using only a proper subset of CCIX protocol memory transactions types including non-cacheable transactions and non-snooping transactions. The light-weight memory protocol controller is selectively uncoupled from the CCIX port and a remote coherent slave controller is coupled in its place. The remote coherent slave controller receives memory access requests and, in response, transmits associated memory access requests to a memory module through the CCIX port using cacheable CCIX protocol memory transaction types.Type: ApplicationFiled: January 16, 2019Publication date: July 16, 2020Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Vydhyanathan Kalyanasundharam, Philip Ng, Alexander J. Branover, Kevin M. Lepak
-
Patent number: 10705959Abstract: Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.Type: GrantFiled: August 31, 2018Date of Patent: July 7, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan
-
Patent number: 10656696Abstract: Systems, apparatuses, and methods for reducing chiplet interrupt latency are disclosed. A system includes one or more processing nodes, one or more memory devices, a communication fabric coupled to the processing unit(s) and memory device(s) via link interfaces, and a power management unit. The power management unit manages the power states of the various components and the link interfaces of the system. If the power management unit detects a request to wake up a given component, and the link interface to the given component is powered down, then the power management unit sends an out-of-band signal to wake up the given component in parallel with powering up the link interface. Also, when multiple link interfaces need to be powered up, the power management unit powers up the multiple link interfaces in an order which complies with voltage regulator load-step requirements while minimizing the latency of pending operations.Type: GrantFiled: February 28, 2018Date of Patent: May 19, 2020Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Benjamin Tsien, Michael J. Tresidder, Ivan Yanfeng Wang, Kevin M. Lepak, Ann Ling, Richard M. Born, John P. Petry, Bryan P. Broussard, Eric Christopher Morton
-
Publication number: 20200073801Abstract: Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.Type: ApplicationFiled: August 31, 2018Publication date: March 5, 2020Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan
-
Patent number: 10545875Abstract: Systems, apparatuses, and methods for implementing a tag accelerator cache are disclosed. A system includes at least a data cache and a control unit coupled to the data cache via a memory controller. The control unit includes a tag accelerator cache (TAC) for caching tag blocks fetched from the data cache. The data cache is organized such that multiple tags are retrieved in a single access. This allows hiding the tag latency penalty for future accesses to neighboring tags and improves cache bandwidth. When a tag block is fetched from the data cache, the tag block is cached in the TAC. Memory requests received by the control unit first lookup the TAC before being forwarded to the data cache. Due to the presence of spatial locality in applications, the TAC can filter out a large percentage of tag accesses to the data cache, resulting in latency and bandwidth savings.Type: GrantFiled: December 27, 2017Date of Patent: January 28, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Ganesh Balakrishnan, Ravindra N. Bhargava
-
Patent number: 10366008Abstract: A data processing system includes a processor and a cache controller coupled to the processor, and adapted to be coupled to a memory. The cache controller uses the memory to form a pseudo direct mapped cache having a plurality of groups of pages. The memory forms a first number of selected pages, including a first page for storing a plurality of sets of tags and a plurality of remaining pages for storing data. Each tag, of the plurality of sets of tags, stores tags for respective entries in a corresponding one of the plurality of remaining pages.Type: GrantFiled: December 12, 2016Date of Patent: July 30, 2019Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Ganesh Balakrishnan, Vydhyanathan Kalyanasundharam, Kevin M. Lepak
-
Publication number: 20190196974Abstract: Systems, apparatuses, and methods for implementing a tag accelerator cache are disclosed. A system includes at least a data cache and a control unit coupled to the data cache via a memory controller. The control unit includes a tag accelerator cache (TAC) for caching tag blocks fetched from the data cache. The data cache is organized such that multiple tags are retrieved in a single access. This allows hiding the tag latency penalty for future accesses to neighboring tags and improves cache bandwidth. When a tag block is fetched from the data cache, the tag block is cached in the TAC. Memory requests received by the control unit first lookup the TAC before being forwarded to the data cache. Due to the presence of spatial locality in applications, the TAC can filter out a large percentage of tag accesses to the data cache, resulting in latency and bandwidth savings.Type: ApplicationFiled: December 27, 2017Publication date: June 27, 2019Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Ganesh Balakrishnan, Ravindra N. Bhargava
-
Publication number: 20190188155Abstract: Systems, apparatuses, and methods for implementing a speculative probe mechanism are disclosed. A system includes at least multiple processing nodes, a probe filter, and a coherent slave. The coherent slave includes an early probe cache to cache recent lookups to the probe filter. The early probe cache includes entries for regions of memory, wherein a region includes a plurality of cache lines. The coherent slave performs parallel lookups to the probe filter and the early probe cache responsive to receiving a memory request. An early probe is sent to a first processing node responsive to determining that a lookup to the early probe cache hits on a first entry identifying the first processing node as an owner of a first region targeted by the memory request and responsive to determining that a confidence indicator of the first entry is greater than a threshold.Type: ApplicationFiled: December 15, 2017Publication date: June 20, 2019Inventors: Amit P. Apte, Ganesh Balakrishnan, Vydhyanathan Kalyanasundharam, Kevin M. Lepak
-
Publication number: 20190188137Abstract: Systems, apparatuses, and methods for maintaining a region-based cache directory are disclosed. A system includes multiple processing nodes, with each processing node including a cache subsystem. The system also includes a cache directory to help manage cache coherency among the different cache subsystems of the system. In order to reduce the number of entries in the cache directory, the cache directory tracks coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Accordingly, the system includes a region-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the system. The cache directory includes a reference count in each entry to track the aggregate number of cache lines that are cached per region. If a reference count of a given entry goes to zero, the cache directory reclaims the given entry.Type: ApplicationFiled: December 18, 2017Publication date: June 20, 2019Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan, Eric Christopher Morton, Elizabeth M. Cooper, Ravindra N. Bhargava
-
Patent number: 10216640Abstract: According to one general aspect, a method may include receiving a request, from a non-central processor device that is configured to perform a direct memory access, to write data within a memory system at a memory address. The method may also include determining if a cache tag hit is generated, based upon the memory address, by a caching tier of the memory system that is closer, latency-wise, to a central processor than a coherent memory interconnect. The method may further include if the caching tier generated the cache tag hit, injecting the data into the caching tier.Type: GrantFiled: March 9, 2015Date of Patent: February 26, 2019Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Andrew J. Rushing, Kevin M. Lepak
-
Patent number: 10095637Abstract: Techniques for improving execution of a lock instruction are provided herein. A lock instruction and younger instructions are allowed to speculatively retire prior to the store portion of the lock instruction committing its value to memory. These instructions thus do not have to wait for the lock instruction to complete before retiring. In the event that the processor detects a violation of the atomic or fencing properties of the lock instruction prior to committing the value of the lock instruction, the processor rolls back state and executes the lock instruction in a slow mode in which younger instructions are not allowed to retire until the stored value of the lock instruction is committed. Speculative retirement of these instructions results in increased processing speed, as instructions no longer need to wait to retire after execution of a lock instruction.Type: GrantFiled: September 15, 2016Date of Patent: October 9, 2018Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Gregory W. Smaus, John M. King, Michael D. Achenbach, Kevin M. Lepak, Matthew A. Rafacz, Noah Bamford
-
Publication number: 20180165202Abstract: A data processing system includes a processor and a cache controller coupled to the processor, and adapted to be coupled to a memory. The cache controller uses the memory to form a pseudo direct mapped cache having a plurality of groups of pages. The memory forms a first number of selected pages, including a first page for storing a plurality of sets of tags and a plurality of remaining pages for storing data. Each tag, of the plurality of sets of tags, stores tags for respective entries in a corresponding one of the plurality of remaining pages.Type: ApplicationFiled: December 12, 2016Publication date: June 14, 2018Applicant: Advanced Micro Devices, Inc.Inventors: Ganesh Balakrishnan, Vydhyanathan Kalyanasundharam, Kevin M. Lepak
-
Publication number: 20180074977Abstract: Techniques for improving execution of a lock instruction are provided herein. A lock instruction and younger instructions are allowed to speculatively retire prior to the store portion of the lock instruction committing its value to memory. These instructions thus do not have to wait for the lock instruction to complete before retiring. In the event that the processor detects a violation of the atomic or fencing properties of the lock instruction prior to committing the value of the lock instruction, the processor rolls back state and executes the lock instruction in a slow mode in which younger instructions are not allowed to retire until the stored value of the lock instruction is committed. Speculative retirement of these instructions results in increased processing speed, as instructions no longer need to wait to retire after execution of a lock instruction.Type: ApplicationFiled: September 15, 2016Publication date: March 15, 2018Applicant: Advanced Micro Devices, Inc.Inventors: Gregory W. Smaus, John M. King, Michael D. Achenbach, Kevin M. Lepak, Matthew A. Rafacz, Noah Bamford
-
Patent number: 9355034Abstract: According to one general aspect, a method of performing a cache transaction may include transmitting a cache request to a target device. The method may include receiving a cache response that is associated with the cache request. The method may further include completing the cache transaction without transmitting an exclusive cache response acknowledgement message to the target device.Type: GrantFiled: May 9, 2014Date of Patent: May 31, 2016Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Kevin M. Lepak, William A. Hughes
-
Publication number: 20150269084Abstract: According to one general aspect, a method may include receiving a request, from a non-central processor device that is configured to perform a direct memory access, to write data within a memory system at a memory address. The method may also include determining if a cache tag hit is generated, based upon the memory address, by a caching tier of the memory system that is closer, latency-wise, to a central processor than a coherent memory interconnect. The method may further include if the caching tier generated the cache tag hit, injecting the data into the caching tier.Type: ApplicationFiled: March 9, 2015Publication date: September 24, 2015Inventors: Andrew J. RUSHING, Kevin M. LEPAK
-
Publication number: 20150199286Abstract: An embodiment includes a system, comprising: an interface; a buffer; and a controller configured to: receive a request through the interface; in a first mode, reserve memory in the buffer for a response to the request if the request is a first type and not reserve memory in the buffer for the response to the request if the request is a second type; and in a second mode, reserve memory in the buffer for the response to the request if the request is the first type or the second type.Type: ApplicationFiled: May 21, 2014Publication date: July 16, 2015Applicant: Samsung Electronics Co., Ltd.Inventors: William A. HUGHES, Kevin M. LEPAK
-
Publication number: 20150186276Abstract: According to one general aspect, a method of performing a cache transaction may include transmitting a cache request to a target device. The method may include receiving a cache response that is associated with the cache request. The method may further include completing the cache transaction without transmitting an exclusive cache response acknowledgement message to the target device.Type: ApplicationFiled: May 9, 2014Publication date: July 2, 2015Applicant: Samsung Electronics Co., Ltd.Inventors: Kevin M. LEPAK, William A. HUGHES