Patents by Inventor Winnie W. Yeung

Winnie W. Yeung has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11947462
    Abstract: Techniques are disclosed relating to cache footprint management. In some embodiments, execution circuitry is configured to perform operations for instructions from multiple threads in parallel. Cache circuitry may store information operated on by threads executed by the execution circuitry. Scheduling circuitry may arbitrate among threads to schedule threads for execution by the execution circuitry. Tracking circuitry may determine one or more performance metrics for the cache circuitry. Control circuitry may, based on the one or more performance metrics meeting a threshold, reduce a limit on a number of threads considered for arbitration by the scheduling circuitry, to control a footprint of information stored by the cache circuitry. Disclosed techniques may advantageously reduce or avoid cache thrashing for certain processor workloads.
    Type: Grant
    Filed: March 3, 2022
    Date of Patent: April 2, 2024
    Assignee: Apple Inc.
    Inventors: Yoong Chert Foo, Terence M. Potter, Donald R. DeSota, Benjiman L. Goodman, Aroun Demeure, Cheng Li, Winnie W. Yeung
  • Patent number: 11842436
    Abstract: Techniques are disclosed relating to arbitration for computer memory resources. In some embodiments, an apparatus includes queue circuitry that implements multiple queues configured to queue requests to access a memory bus. Control circuitry may, in response to detecting a first threshold condition associated with the queue circuitry, generate a first snapshot that indicates numbers of requests in respective queues of the multiple queues at a first time. The control circuitry may generate a second snapshot that indicates numbers of requests in respective queues of the multiple queues at a second time that is subsequent to the first time. The control circuitry may arbitrate between requests from the multiple queues to select requests to access the memory bus, where the arbitration is based on snapshots to which requests from the multiple queues belong. Disclosed techniques may approximate age-based scheduling while reducing area and power consumption.
    Type: Grant
    Filed: August 1, 2022
    Date of Patent: December 12, 2023
    Assignee: Apple Inc.
    Inventors: Winnie W. Yeung, Leela Kishore Kothamasu, Zelin Zhang, Guanlan Xu, Eddie M. Robinson
  • Publication number: 20220374359
    Abstract: Techniques are disclosed relating to multi-block fetches for cache misses. In some embodiments, cache tag circuitry maintains a tag value that is shared by multiple cache blocks. In response to a miss, the cache may initiate a fetch request to a next level cache or memory. Aggregation circuitry may aggregate multiple fetch requests for cache blocks that share the tag value and fetch circuitry may initiate a single multi-block fetch operation to the next level cache or memory that returns cache blocks for the aggregated multiple fetch requests. In various embodiments, disclosed techniques may improve performance (e.g., by reducing fetch bus transactions), reduce power consumption, or both, relative to traditional techniques.
    Type: Application
    Filed: May 19, 2021
    Publication date: November 24, 2022
    Inventors: Winnie W. Yeung, Cheng Li
  • Publication number: 20220375161
    Abstract: Techniques are disclosed relating to arbitration for computer memory resources. In some embodiments, an apparatus includes queue circuitry that implements multiple queues configured to queue requests to access a memory bus. Control circuitry may, in response to detecting a first threshold condition associated with the queue circuitry, generate a first snapshot that indicates numbers of requests in respective queues of the multiple queues at a first time. The control circuitry may generate a second snapshot that indicates numbers of requests in respective queues of the multiple queues at a second time that is subsequent to the first time. The control circuitry may arbitrate between requests from the multiple queues to select requests to access the memory bus, where the arbitration is based on snapshots to which requests from the multiple queues belong. Disclosed techniques may approximate age-based scheduling while reducing area and power consumption.
    Type: Application
    Filed: August 1, 2022
    Publication date: November 24, 2022
    Inventors: Winnie W. Yeung, Leela Kishore Kothamasu, Zelin Zhang, Guanlan Xu, Eddie M. Robinson
  • Patent number: 11488350
    Abstract: Techniques are disclosed relating to compression of data stored at different cache levels. In some embodiments, a memory system implements a storage hierarchy that includes first cache circuitry and second cache circuitry at different levels of the hierarchy. Processor circuitry generates write data to be written to the memory system. In some embodiments, first compression circuitry is configured to compress a first block of write data in response to full accumulation of the first block in the first cache circuitry and second compression circuitry is configured to compress a second block of write data in response to full accumulation of the second block in the second cache circuitry. Write circuitry may write the first and second compressed blocks of data in a single combined write to a higher level in the storage hierarchy.
    Type: Grant
    Filed: June 4, 2021
    Date of Patent: November 1, 2022
    Assignee: Apple Inc.
    Inventors: Anthony P. DeLaurier, Karl D. Mann, Tyson J. Bergland, Winnie W. Yeung
  • Patent number: 11467959
    Abstract: Techniques are disclosed relating to caching for address translation. In some embodiments, address translation circuitry is configured to process requests to translate addresses in a first address space to addresses in a second address space. The translation circuitry may include cache circuitry configured to store translation information, arbitration circuitry configured to arbitrate among ready requests for access to entries of the cache, and hazard circuitry. The hazard circuitry may assign a first request to an ready status the arbitration circuitry based on detection of an absence of hazards for a first address of the first request and add a second request to a queue of requests for the arbitration circuitry based on detection of a hazard for a second address of the second request. Independent arbitration for requests without hazards may improve performance in various aspects, relative to traditional techniques.
    Type: Grant
    Filed: May 19, 2021
    Date of Patent: October 11, 2022
    Assignee: Apple Inc.
    Inventors: Winnie W. Yeung, Cheng Li
  • Patent number: 11443479
    Abstract: Techniques are disclosed relating to arbitration for computer memory resources. In some embodiments, an apparatus includes queue circuitry that implements multiple queues configured to queue requests to access a memory bus. Control circuitry may, in response to detecting a first threshold condition associated with the queue circuitry, generate a first snapshot that indicates numbers of requests in respective queues of the multiple queues at a first time. The control circuitry may generate a second snapshot that indicates numbers of requests in respective queues of the multiple queues at a second time that is subsequent to the first time. The control circuitry may arbitrate between requests from the multiple queues to select requests to access the memory bus, where the arbitration is based on snapshots to which requests from the multiple queues belong. Disclosed techniques may approximate age-based scheduling while reducing area and power consumption.
    Type: Grant
    Filed: May 19, 2021
    Date of Patent: September 13, 2022
    Assignee: Apple Inc.
    Inventors: Winnie W. Yeung, Leela Kishore Kothamasu, Zelin Zhang, Guanlan Xu, Eddie M. Robinson
  • Publication number: 20210295593
    Abstract: Techniques are disclosed relating to compression of data stored at different cache levels. In some embodiments, a memory system implements a storage hierarchy that includes first cache circuitry and second cache circuitry at different levels of the hierarchy. Processor circuitry generates write data to be written to the memory system. In some embodiments, first compression circuitry is configured to compress a first block of write data in response to full accumulation of the first block in the first cache circuitry and second compression circuitry is configured to compress a second block of write data in response to full accumulation of the second block in the second cache circuitry. Write circuitry may write the first and second compressed blocks of data in a single combined write to a higher level in the storage hierarchy.
    Type: Application
    Filed: June 4, 2021
    Publication date: September 23, 2021
    Inventors: Anthony P. DeLaurier, Karl D. Mann, Tyson J. Bergland, Winnie W. Yeung
  • Patent number: 11062507
    Abstract: Techniques are disclosed relating to compression of data stored at different cache levels. In some embodiments, programmable shader circuitry is configured to execute program instructions of compute kernels that write pixel data. In some embodiments, a first cache is configured to store pixel write data from the programmable shader circuitry and first compression circuitry is configured to compress a first block of pixel write data in response to full accumulation of the first block in the first cache circuitry. In some embodiments, second cache circuitry is configured to store pixel write data from the programmable shader circuitry at a higher level in a storage hierarchy than the first cache circuitry and second compression circuitry is configured to compress a second block of pixel write data in response to full accumulation of the second block in the second cache circuitry.
    Type: Grant
    Filed: November 4, 2019
    Date of Patent: July 13, 2021
    Assignee: Apple Inc.
    Inventors: Anthony P. DeLaurier, Karl D. Mann, Tyson J. Bergland, Winnie W. Yeung
  • Publication number: 20210134052
    Abstract: Techniques are disclosed relating to compression of data stored at different cache levels. In some embodiments, programmable shader circuitry is configured to execute program instructions of compute kernels that write pixel data. In some embodiments, a first cache is configured to store pixel write data from the programmable shader circuitry and first compression circuitry is configured to compress a first block of pixel write data in response to full accumulation of the first block in the first cache circuitry. In some embodiments, second cache circuitry is configured to store pixel write data from the programmable shader circuitry at a higher level in a storage hierarchy than the first cache circuitry and second compression circuitry is configured to compress a second block of pixel write data in response to full accumulation of the second block in the second cache circuitry.
    Type: Application
    Filed: November 4, 2019
    Publication date: May 6, 2021
    Inventors: Anthony P. DeLaurier, Karl D. Mann, Tyson J. Bergland, Winnie W. Yeung
  • Patent number: 10970223
    Abstract: Systems, apparatuses, and methods for efficiently allocating data in a cache are described. In various embodiments, a processor decodes an indication in a software application identifying a temporal data set. The data set is flagged with a data set identifier (DSID) indicating temporal data to drop after consumption. When the data set is allocated in a cache, the data set is stored with a non-replaceable attribute to prevent a cache replacement policy from evicting the data set before it is dropped. A drop command with an indication of the DSID of the data set is later issued after the data set is read (consumed). A copy of the data set is not written back to the lower-level memory although the data set is removed from the cache. An interrupt is generated to notify firmware or other software of the completion of the drop command.
    Type: Grant
    Filed: May 13, 2019
    Date of Patent: April 6, 2021
    Assignee: Apple Inc.
    Inventors: Wolfgang H. Klingauf, Kenneth C. Dyke, Karthik Ramani, Winnie W. Yeung, Anthony P. DeLaurier, Luc R. Semeria, David A. Gotwalt, Srinivasa Rangan Sridharan, Muditha Kanchana
  • Publication number: 20190266102
    Abstract: Systems, apparatuses, and methods for efficiently allocating data in a cache are described. In various embodiments, a processor decodes an indication in a software application identifying a temporal data set. The data set is flagged with a data set identifier (DSID) indicating temporal data to drop after consumption. When the data set is allocated in a cache, the data set is stored with a non-replaceable attribute to prevent a cache replacement policy from evicting the data set before it is dropped. A drop command with an indication of the DSID of the data set is later issued after the data set is read (consumed). A copy of the data set is not written back to the lower-level memory although the data set is removed from the cache. An interrupt is generated to notify firmware or other software of the completion of the drop command.
    Type: Application
    Filed: May 13, 2019
    Publication date: August 29, 2019
    Inventors: Wolfgang H. Klingauf, Kenneth C. Dyke, Karthik Ramani, Winnie W. Yeung, Anthony P. DeLaurier, Luc R. Semeria, David A. Gotwalt, Srinivasa Rangan Sridharan, Muditha Kanchana
  • Patent number: 10289565
    Abstract: Systems, apparatuses, and methods for efficiently allocating data in a cache are described. In various embodiments, a processor decodes an indication in a software application identifying a temporal data set. The data set is flagged with a data set identifier (DSID) indicating temporal data to drop after consumption. When the data set is allocated in a cache, the data set is stored with a non-replaceable attribute to prevent a cache replacement policy from evicting the data set before it is dropped. A drop command with an indication of the DSID of the data set is later issued after the data set is read (consumed). A copy of the data set is not written back to the lower-level memory although the data set is removed from the cache. An interrupt is generated to notify firmware or other software of the completion of the drop command.
    Type: Grant
    Filed: May 31, 2017
    Date of Patent: May 14, 2019
    Assignee: Apple Inc.
    Inventors: Wolfgang H. Klingauf, Kenneth C. Dyke, Karthik Ramani, Winnie W. Yeung, Anthony P. DeLaurier, Luc R. Semeria, David A. Gotwalt, Srinivasa Rangan Sridharan, Muditha Kanchana
  • Publication number: 20180349291
    Abstract: Systems, apparatuses, and methods for efficiently allocating data in a cache are described. In various embodiments, a processor decodes an indication in a software application identifying a temporal data set. The data set is flagged with a data set identifier (DSID) indicating temporal data to drop after consumption. When the data set is allocated in a cache, the data set is stored with a non-replaceable attribute to prevent a cache replacement policy from evicting the data set before it is dropped. A drop command with an indication of the DSID of the data set is later issued after the data set is read (consumed). A copy of the data set is not written back to the lower-level memory although the data set is removed from the cache. An interrupt is generated to notify firmware or other software of the completion of the drop command.
    Type: Application
    Filed: May 31, 2017
    Publication date: December 6, 2018
    Inventors: Wolfgang H. Klingauf, Kenneth C. Dyke, Karthik Ramani, Winnie W. Yeung, Anthony P. DeLaurier, Luc R. Semeria, David A. Gotwalt, Srinivasa Rangan Sridharan, Muditha Kanchana
  • Patent number: 8098255
    Abstract: A graphics system including a custom graphics and audio processor produces exciting 2D and 3D graphics and surround sound. The system includes a graphics and audio processor including a 3D graphics pipeline and an audio digital signal processor. A memory controller performs a wide range of memory control related functions including arbitrating between various competing resources seeking access to main memory, handling memory latency and bandwidth requirements of the resources requesting memory access, buffering writes to reduce bus turn around, refreshing main memory, and protecting main memory using programmable registers. The memory controller minimizes memory read/write switching using a “global” write queue which queues write requests from various diverse competing resources. In this fashion, multiple competing resources for memory writes are combined into one resource from which write requests are obtained.
    Type: Grant
    Filed: May 22, 2009
    Date of Patent: January 17, 2012
    Assignee: Nintendo Co., Ltd.
    Inventors: Farhad Fouladi, Winnie W. Yeung, Howard Cheng
  • Publication number: 20090225094
    Abstract: A graphics system including a custom graphics and audio processor produces exciting 2D and 3D graphics and surround sound. The system includes a graphics and audio processor including a 3D graphics pipeline and an audio digital signal processor. A memory controller performs a wide range of memory control related functions including arbitrating between various competing resources seeking access to main memory, handling memory latency and bandwidth requirements of the resources requesting memory access, buffering writes to reduce bus turn around, refreshing main memory, and protecting main memory using programmable registers. The memory controller minimizes memory read/write switching using a “global” write queue which queues write requests from various diverse competing resources. In this fashion, multiple competing resources for memory writes are combined into one resource from which write requests are obtained.
    Type: Application
    Filed: May 22, 2009
    Publication date: September 10, 2009
    Inventors: Farhad Fouladi, Winnie W. Yeung, Howard Cheng
  • Patent number: 7538772
    Abstract: A graphics system including a custom graphics and audio processor produces exciting 2D and 3D graphics and surround sound. The system includes a graphics and audio processor including a 3D graphics pipeline and an audio digital signal processor. A memory controller performs a wide range of memory control related functions including arbitrating between various competing resources seeking access to main memory, handling memory latency and bandwidth requirements of the resources requesting memory access, buffering writes to reduce bus turn around, refreshing main memory, and protecting main memory using programmable registers. The memory controller minimizes memory read/write switching using a “global” write queue which queues write requests from various diverse competing resources. In this fashion, multiple competing resources for memory writes are combined into one resource from which write requests are obtained.
    Type: Grant
    Filed: November 28, 2000
    Date of Patent: May 26, 2009
    Assignee: Nintendo Co., Ltd.
    Inventors: Farhad Fouladi, Winnie W. Yeung, Howard Cheng