Patents by Inventor Winnie W. Yeung
Winnie W. Yeung has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12360899Abstract: Techniques are disclosed relating to graphics processor data caches. In some embodiments, datapath executes instructions that operate on input operands from architectural registers. Data cache circuitry caches architectural register data for the datapath circuitry. Scoreboard circuitry tracks, for a given architectural register: map information that indicates whether the architectural register is mapped to an entry of the data cache circuitry and a pointer to the entry of the data cache circuitry. Tiered scoreboard circuitry and data storage circuitry may be implemented (e.g., to provide fast scoreboard access for active threads and to give a landing spot for long-latency data retrieval operations). Various disclosed techniques may improve cache performance, reduce power consumption, reduce area, or some combination thereof.Type: GrantFiled: January 11, 2024Date of Patent: July 15, 2025Assignee: Apple Inc.Inventors: Winnie W. Yeung, Zelin Zhang, Cheng Li, Hungse Cha, Leela Kishore Kothamasu
-
Publication number: 20250104181Abstract: Techniques are disclosed relating to data compression in graphics processors. In some embodiments, cache circuitry is coupled to shader processor circuitry and is configured to store graphics data that includes a compressed block of data associated with a surface and metadata for the compressed block of data. Metadata coherence circuitry may cache the metadata for the compressed block of data, receive an indication of a write command for non-compressed data associated with the surface, wherein the write command identifies the metadata and has a different address than the compressed block of data, and determine, based on the metadata and the indication, to invalidate the compressed block of data in the cache circuitry. This may maintain read/write coherence in a cache that stores both compressed and uncompressed data, in some embodiments.Type: ApplicationFiled: August 6, 2024Publication date: March 27, 2025Inventors: Karthik Ramani, Tyson J. Bergland, Leela Kishore Kothamasu, Hongzhou Zhao, Winnie W. Yeung, Mladen Wilder
-
Publication number: 20250103493Abstract: Techniques are disclosed relating to graphics processor data caches. In some embodiments, datapath executes instructions that operate on input operands from architectural registers. Data cache circuitry caches architectural register data for the datapath circuitry. Scoreboard circuitry tracks, for a given architectural register: map information that indicates whether the architectural register is mapped to an entry of the data cache circuitry and a pointer to the entry of the data cache circuitry. Tiered scoreboard circuitry and data storage circuitry may be implemented (e.g., to provide fast scoreboard access for active threads and to give a landing spot for long-latency data retrieval operations). Various disclosed techniques may improve cache performance, reduce power consumption, reduce area, or some combination thereof.Type: ApplicationFiled: January 11, 2024Publication date: March 27, 2025Inventors: Winnie W. Yeung, Zelin Zhang, Cheng Li, Hungse Cha, Leela Kishore Kothamasu
-
Publication number: 20250094357Abstract: Techniques are disclosed relating to eviction control for cache lines that store register data. In some embodiments, memory hierarchy circuitry is configured to provide memory backing for register operand data in one or more cache circuits. Lock circuitry may control a first set of lock indicators for a set of registers for a first thread, including to assert one or more lock indicators for registers that are indicated, by decode circuitry, as being utilized by decoded instructions of the first thread. The lock circuitry may preserve register operand data in the one or more cache circuits, including to prevent eviction of a given cache line from a cache circuit based on an asserted lock indicator. The lock circuitry may clear the first set of lock indicators in response to a reset event. Disclosed techniques may advantageously retain relevant register information in the cache with limited control circuit area.Type: ApplicationFiled: November 27, 2024Publication date: March 20, 2025Inventors: Jonathan M. Redshaw, Winnie W. Yeung, Benjiman L. Goodman, David K. Li, Zelin Zhang, Yoong Chert Foo
-
Patent number: 12248399Abstract: Techniques are disclosed relating to multi-block fetches for cache misses. In some embodiments, cache tag circuitry maintains a tag value that is shared by multiple cache blocks. In response to a miss, the cache may initiate a fetch request to a next level cache or memory. Aggregation circuitry may aggregate multiple fetch requests for cache blocks that share the tag value and fetch circuitry may initiate a single multi-block fetch operation to the next level cache or memory that returns cache blocks for the aggregated multiple fetch requests. In various embodiments, disclosed techniques may improve performance (e.g., by reducing fetch bus transactions), reduce power consumption, or both, relative to traditional techniques.Type: GrantFiled: May 19, 2021Date of Patent: March 11, 2025Assignee: Apple Inc.Inventors: Winnie W. Yeung, Cheng Li
-
Patent number: 12182037Abstract: Techniques are disclosed relating to eviction control for cache lines that store register data. In some embodiments, memory hierarchy circuitry is configured to provide memory backing for register operand data in one or more cache circuits. Lock circuitry may control a first set of lock indicators for a set of registers for a first thread, including to assert one or more lock indicators for registers that are indicated, by decode circuitry, as being utilized by decoded instructions of the first thread. The lock circuitry may preserve register operand data in the one or more cache circuits, including to prevent eviction of a given cache line from a cache circuit based on an asserted lock indicator. The lock circuitry may clear the first set of lock indicators in response to a reset event. Disclosed techniques may advantageously retain relevant register information in the cache with limited control circuit area.Type: GrantFiled: February 23, 2023Date of Patent: December 31, 2024Assignee: Apple Inc.Inventors: Jonathan M. Redshaw, Winnie W. Yeung, Benjiman L. Goodman, David K. Li, Zelin Zhang, Yoong Chert Foo
-
Publication number: 20240289282Abstract: Techniques are disclosed relating to eviction control for cache lines that store register data. In some embodiments, memory hierarchy circuitry is configured to provide memory backing for register operand data in one or more cache circuits. Lock circuitry may control a first set of lock indicators for a set of registers for a first thread, including to assert one or more lock indicators for registers that are indicated, by decode circuitry, as being utilized by decoded instructions of the first thread. The lock circuitry may preserve register operand data in the one or more cache circuits, including to prevent eviction of a given cache line from a cache circuit based on an asserted lock indicator. The lock circuitry may clear the first set of lock indicators in response to a reset event. Disclosed techniques may advantageously retain relevant register information in the cache with limited control circuit area.Type: ApplicationFiled: February 23, 2023Publication date: August 29, 2024Inventors: Jonathan M. Redshaw, Winnie W. Yeung, Benjiman L. Goodman, David K. Li, Zelin Zhang, Yoong Chert Foo
-
Patent number: 11947462Abstract: Techniques are disclosed relating to cache footprint management. In some embodiments, execution circuitry is configured to perform operations for instructions from multiple threads in parallel. Cache circuitry may store information operated on by threads executed by the execution circuitry. Scheduling circuitry may arbitrate among threads to schedule threads for execution by the execution circuitry. Tracking circuitry may determine one or more performance metrics for the cache circuitry. Control circuitry may, based on the one or more performance metrics meeting a threshold, reduce a limit on a number of threads considered for arbitration by the scheduling circuitry, to control a footprint of information stored by the cache circuitry. Disclosed techniques may advantageously reduce or avoid cache thrashing for certain processor workloads.Type: GrantFiled: March 3, 2022Date of Patent: April 2, 2024Assignee: Apple Inc.Inventors: Yoong Chert Foo, Terence M. Potter, Donald R. DeSota, Benjiman L. Goodman, Aroun Demeure, Cheng Li, Winnie W. Yeung
-
Patent number: 11842436Abstract: Techniques are disclosed relating to arbitration for computer memory resources. In some embodiments, an apparatus includes queue circuitry that implements multiple queues configured to queue requests to access a memory bus. Control circuitry may, in response to detecting a first threshold condition associated with the queue circuitry, generate a first snapshot that indicates numbers of requests in respective queues of the multiple queues at a first time. The control circuitry may generate a second snapshot that indicates numbers of requests in respective queues of the multiple queues at a second time that is subsequent to the first time. The control circuitry may arbitrate between requests from the multiple queues to select requests to access the memory bus, where the arbitration is based on snapshots to which requests from the multiple queues belong. Disclosed techniques may approximate age-based scheduling while reducing area and power consumption.Type: GrantFiled: August 1, 2022Date of Patent: December 12, 2023Assignee: Apple Inc.Inventors: Winnie W. Yeung, Leela Kishore Kothamasu, Zelin Zhang, Guanlan Xu, Eddie M. Robinson
-
Publication number: 20220375161Abstract: Techniques are disclosed relating to arbitration for computer memory resources. In some embodiments, an apparatus includes queue circuitry that implements multiple queues configured to queue requests to access a memory bus. Control circuitry may, in response to detecting a first threshold condition associated with the queue circuitry, generate a first snapshot that indicates numbers of requests in respective queues of the multiple queues at a first time. The control circuitry may generate a second snapshot that indicates numbers of requests in respective queues of the multiple queues at a second time that is subsequent to the first time. The control circuitry may arbitrate between requests from the multiple queues to select requests to access the memory bus, where the arbitration is based on snapshots to which requests from the multiple queues belong. Disclosed techniques may approximate age-based scheduling while reducing area and power consumption.Type: ApplicationFiled: August 1, 2022Publication date: November 24, 2022Inventors: Winnie W. Yeung, Leela Kishore Kothamasu, Zelin Zhang, Guanlan Xu, Eddie M. Robinson
-
Publication number: 20220374359Abstract: Techniques are disclosed relating to multi-block fetches for cache misses. In some embodiments, cache tag circuitry maintains a tag value that is shared by multiple cache blocks. In response to a miss, the cache may initiate a fetch request to a next level cache or memory. Aggregation circuitry may aggregate multiple fetch requests for cache blocks that share the tag value and fetch circuitry may initiate a single multi-block fetch operation to the next level cache or memory that returns cache blocks for the aggregated multiple fetch requests. In various embodiments, disclosed techniques may improve performance (e.g., by reducing fetch bus transactions), reduce power consumption, or both, relative to traditional techniques.Type: ApplicationFiled: May 19, 2021Publication date: November 24, 2022Inventors: Winnie W. Yeung, Cheng Li
-
Patent number: 11488350Abstract: Techniques are disclosed relating to compression of data stored at different cache levels. In some embodiments, a memory system implements a storage hierarchy that includes first cache circuitry and second cache circuitry at different levels of the hierarchy. Processor circuitry generates write data to be written to the memory system. In some embodiments, first compression circuitry is configured to compress a first block of write data in response to full accumulation of the first block in the first cache circuitry and second compression circuitry is configured to compress a second block of write data in response to full accumulation of the second block in the second cache circuitry. Write circuitry may write the first and second compressed blocks of data in a single combined write to a higher level in the storage hierarchy.Type: GrantFiled: June 4, 2021Date of Patent: November 1, 2022Assignee: Apple Inc.Inventors: Anthony P. DeLaurier, Karl D. Mann, Tyson J. Bergland, Winnie W. Yeung
-
Patent number: 11467959Abstract: Techniques are disclosed relating to caching for address translation. In some embodiments, address translation circuitry is configured to process requests to translate addresses in a first address space to addresses in a second address space. The translation circuitry may include cache circuitry configured to store translation information, arbitration circuitry configured to arbitrate among ready requests for access to entries of the cache, and hazard circuitry. The hazard circuitry may assign a first request to an ready status the arbitration circuitry based on detection of an absence of hazards for a first address of the first request and add a second request to a queue of requests for the arbitration circuitry based on detection of a hazard for a second address of the second request. Independent arbitration for requests without hazards may improve performance in various aspects, relative to traditional techniques.Type: GrantFiled: May 19, 2021Date of Patent: October 11, 2022Assignee: Apple Inc.Inventors: Winnie W. Yeung, Cheng Li
-
Patent number: 11443479Abstract: Techniques are disclosed relating to arbitration for computer memory resources. In some embodiments, an apparatus includes queue circuitry that implements multiple queues configured to queue requests to access a memory bus. Control circuitry may, in response to detecting a first threshold condition associated with the queue circuitry, generate a first snapshot that indicates numbers of requests in respective queues of the multiple queues at a first time. The control circuitry may generate a second snapshot that indicates numbers of requests in respective queues of the multiple queues at a second time that is subsequent to the first time. The control circuitry may arbitrate between requests from the multiple queues to select requests to access the memory bus, where the arbitration is based on snapshots to which requests from the multiple queues belong. Disclosed techniques may approximate age-based scheduling while reducing area and power consumption.Type: GrantFiled: May 19, 2021Date of Patent: September 13, 2022Assignee: Apple Inc.Inventors: Winnie W. Yeung, Leela Kishore Kothamasu, Zelin Zhang, Guanlan Xu, Eddie M. Robinson
-
Publication number: 20210295593Abstract: Techniques are disclosed relating to compression of data stored at different cache levels. In some embodiments, a memory system implements a storage hierarchy that includes first cache circuitry and second cache circuitry at different levels of the hierarchy. Processor circuitry generates write data to be written to the memory system. In some embodiments, first compression circuitry is configured to compress a first block of write data in response to full accumulation of the first block in the first cache circuitry and second compression circuitry is configured to compress a second block of write data in response to full accumulation of the second block in the second cache circuitry. Write circuitry may write the first and second compressed blocks of data in a single combined write to a higher level in the storage hierarchy.Type: ApplicationFiled: June 4, 2021Publication date: September 23, 2021Inventors: Anthony P. DeLaurier, Karl D. Mann, Tyson J. Bergland, Winnie W. Yeung
-
Patent number: 11062507Abstract: Techniques are disclosed relating to compression of data stored at different cache levels. In some embodiments, programmable shader circuitry is configured to execute program instructions of compute kernels that write pixel data. In some embodiments, a first cache is configured to store pixel write data from the programmable shader circuitry and first compression circuitry is configured to compress a first block of pixel write data in response to full accumulation of the first block in the first cache circuitry. In some embodiments, second cache circuitry is configured to store pixel write data from the programmable shader circuitry at a higher level in a storage hierarchy than the first cache circuitry and second compression circuitry is configured to compress a second block of pixel write data in response to full accumulation of the second block in the second cache circuitry.Type: GrantFiled: November 4, 2019Date of Patent: July 13, 2021Assignee: Apple Inc.Inventors: Anthony P. DeLaurier, Karl D. Mann, Tyson J. Bergland, Winnie W. Yeung
-
Publication number: 20210134052Abstract: Techniques are disclosed relating to compression of data stored at different cache levels. In some embodiments, programmable shader circuitry is configured to execute program instructions of compute kernels that write pixel data. In some embodiments, a first cache is configured to store pixel write data from the programmable shader circuitry and first compression circuitry is configured to compress a first block of pixel write data in response to full accumulation of the first block in the first cache circuitry. In some embodiments, second cache circuitry is configured to store pixel write data from the programmable shader circuitry at a higher level in a storage hierarchy than the first cache circuitry and second compression circuitry is configured to compress a second block of pixel write data in response to full accumulation of the second block in the second cache circuitry.Type: ApplicationFiled: November 4, 2019Publication date: May 6, 2021Inventors: Anthony P. DeLaurier, Karl D. Mann, Tyson J. Bergland, Winnie W. Yeung
-
Patent number: 10970223Abstract: Systems, apparatuses, and methods for efficiently allocating data in a cache are described. In various embodiments, a processor decodes an indication in a software application identifying a temporal data set. The data set is flagged with a data set identifier (DSID) indicating temporal data to drop after consumption. When the data set is allocated in a cache, the data set is stored with a non-replaceable attribute to prevent a cache replacement policy from evicting the data set before it is dropped. A drop command with an indication of the DSID of the data set is later issued after the data set is read (consumed). A copy of the data set is not written back to the lower-level memory although the data set is removed from the cache. An interrupt is generated to notify firmware or other software of the completion of the drop command.Type: GrantFiled: May 13, 2019Date of Patent: April 6, 2021Assignee: Apple Inc.Inventors: Wolfgang H. Klingauf, Kenneth C. Dyke, Karthik Ramani, Winnie W. Yeung, Anthony P. DeLaurier, Luc R. Semeria, David A. Gotwalt, Srinivasa Rangan Sridharan, Muditha Kanchana
-
Publication number: 20190266102Abstract: Systems, apparatuses, and methods for efficiently allocating data in a cache are described. In various embodiments, a processor decodes an indication in a software application identifying a temporal data set. The data set is flagged with a data set identifier (DSID) indicating temporal data to drop after consumption. When the data set is allocated in a cache, the data set is stored with a non-replaceable attribute to prevent a cache replacement policy from evicting the data set before it is dropped. A drop command with an indication of the DSID of the data set is later issued after the data set is read (consumed). A copy of the data set is not written back to the lower-level memory although the data set is removed from the cache. An interrupt is generated to notify firmware or other software of the completion of the drop command.Type: ApplicationFiled: May 13, 2019Publication date: August 29, 2019Inventors: Wolfgang H. Klingauf, Kenneth C. Dyke, Karthik Ramani, Winnie W. Yeung, Anthony P. DeLaurier, Luc R. Semeria, David A. Gotwalt, Srinivasa Rangan Sridharan, Muditha Kanchana
-
Patent number: 10289565Abstract: Systems, apparatuses, and methods for efficiently allocating data in a cache are described. In various embodiments, a processor decodes an indication in a software application identifying a temporal data set. The data set is flagged with a data set identifier (DSID) indicating temporal data to drop after consumption. When the data set is allocated in a cache, the data set is stored with a non-replaceable attribute to prevent a cache replacement policy from evicting the data set before it is dropped. A drop command with an indication of the DSID of the data set is later issued after the data set is read (consumed). A copy of the data set is not written back to the lower-level memory although the data set is removed from the cache. An interrupt is generated to notify firmware or other software of the completion of the drop command.Type: GrantFiled: May 31, 2017Date of Patent: May 14, 2019Assignee: Apple Inc.Inventors: Wolfgang H. Klingauf, Kenneth C. Dyke, Karthik Ramani, Winnie W. Yeung, Anthony P. DeLaurier, Luc R. Semeria, David A. Gotwalt, Srinivasa Rangan Sridharan, Muditha Kanchana