Patents by Inventor PATRICK P. LAI

PATRICK P. LAI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10852810
    Abstract: An integrated circuit comprising a plurality of last-level caches, a plurality of processor cores configured to access data in the plurality of last-level caches, and an interconnect network. The plurality of last-level caches can be placed in at least a high cache-power consumption mode and a low cache-power consumption mode. The plurality of last-level caches includes a first last-level cache and a second last-level cache. The interconnect network comprises a plurality of links that can be placed in at least a high link-power consumption mode and a low link-power consumption mode. The interconnect network is configured to cause a first subset of the plurality of links to be placed in the low link-power consumption mode based at least in part on the first last-level cache being in the low cache-power consumption mode.
    Type: Grant
    Filed: March 6, 2019
    Date of Patent: December 1, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Patrick P. Lai, Robert Allen Shearer
  • Patent number: 10591978
    Abstract: Processors may include cache circuitry that is a significant source of power consumption. A cache is going to be placed into a lower power mode. Based at least in part on this anticipated transition, the contents of the cache data lines are copied into persistent storage. While the cache is in the lower power mode, the tag circuitry is kept operational. When an access request is made to the cache, a relatively fast lookup of the tag in the tag array can be made. The location where the associated cache line is stored in the persistent storage may be determined from the tag data. Upon a tag hit, the system is able to find the contents of the requested cache line in the persistent storage without returning the storage array of the cache to a fully operational state.
    Type: Grant
    Filed: May 30, 2017
    Date of Patent: March 17, 2020
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Patrick P. Lai, Robert Allen Shearer
  • Patent number: 10565122
    Abstract: The lookup of accesses (including snoops) to cache tag ways is serialized to perform one (or less than all) tag way access per clock (or even slower). Thus, for a N-way set associative cache, instead of performing lookup/comparison on the N tag ways in parallel, the lookups are performed one tag way a time. Way prediction is utilized to select an order to look in the N ways. This can include selecting which tag way will be looked in first. This helps to reduce the average number of cycles and lookups required.
    Type: Grant
    Filed: May 30, 2017
    Date of Patent: February 18, 2020
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Patrick P. Lai, Robert Allen Shearer
  • Patent number: 10409763
    Abstract: Various different embodiments of the invention are described including: (1) a method and apparatus for intelligently allocating threads within a binary translation system; (2) data cache way prediction guided by binary translation code morphing software; (3) fast interpreter hardware support on the data-side; (4) out-of-order retirement; (5) decoupled load retirement in an atomic OOO processor; (6) handling transactional and atomic memory in an out-of-order binary translation based processor; and (7) speculative memory management in a binary translation based out of order processor.
    Type: Grant
    Filed: June 30, 2014
    Date of Patent: September 10, 2019
    Assignee: INTEL CORPORATION
    Inventors: Patrick P. Lai, Ethan Schuchman, David Keppel, Denis M. Khartikov, Polychronis Xekalakis, Joshua B. Fryman, Allan D. Knies, Naveen Neelakantam, Gregor Stellpflug, John H. Kelm, Mirem Hyuseinova Seidahmedova, Demos Pavlou, Jaroslaw Topp
  • Publication number: 20190204898
    Abstract: An integrated circuit comprising a plurality of last-level caches, a plurality of processor cores configured to access data in the plurality of last-level caches, and an interconnect network. The plurality of last-level caches can be placed in at least a high cache-power consumption mode and a low cache-power consumption mode. The plurality of last-level caches includes a first last-level cache and a second last-level cache. The interconnect network comprises a plurality of links that can be placed in at least a high link-power consumption mode and a low link-power consumption mode. The interconnect network is configured to cause a first subset of the plurality of links to be placed in the low link-power consumption mode based at least in part on the first last-level cache being in the low cache-power consumption mode.
    Type: Application
    Filed: March 6, 2019
    Publication date: July 4, 2019
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Patrick P. LAI, Robert Allen SHEARER
  • Patent number: 10324850
    Abstract: A cache system is configurable to trade power consumption for cache access latency. When it is desired for a system with a cache to conserve dynamic power, the lookup of accesses (e.g., snoops) to cache tag ways is serialized to perform one (or less than all) tag way access per clock (or even slower). Thus, for an N-way set associative cache, instead of performing a lookup/comparison on the N tag ways in parallel, the lookups are performed one tag way at a time. This take N times more cycles thereby reducing the access/snoop bandwidth by a factor of N. However, the power consumption of the serialized access when compared to ‘all parallel’ accesses/snoops is reduced.
    Type: Grant
    Filed: November 11, 2016
    Date of Patent: June 18, 2019
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Patrick P. Lai, Robert Allen Shearer
  • Patent number: 10324724
    Abstract: Methods and apparatuses relating to a fusion manager to fuse instructions are described. In one embodiment, a hardware processor includes a hardware binary translator to translate an instruction stream into a translated instruction stream, a hardware fusion manager to fuse multiple instructions of the translated instruction stream into a single fused instruction, a hardware decode unit to decode the single fused instruction into a decoded, single fused instruction, and a hardware execution unit to execute the decoded, single fused instruction.
    Type: Grant
    Filed: December 16, 2015
    Date of Patent: June 18, 2019
    Assignee: Intel Corporation
    Inventors: Patrick P. Lai, Tyler N. Sondag, Sebastian Winkel, Polychronis Xekalakis, Ethan Schuchman, Jayesh Iyer
  • Patent number: 10318428
    Abstract: A multi-core processing chip where the last-level cache functionality is implemented by multiple last-level caches (a.k.a. cache slices) that are physically and logically distributed. The hash function used by the processors on the chip is changed according to which of last-level caches are active (e.g., ‘on’) and which are in a lower power consumption mode (e.g., ‘off’.) Thus, a first hash function is used to distribute accesses (i.e., reads and writes of data blocks) to all of the last-level caches when, for example, all of the last-level caches are ‘on.’ A second hash function is used to distribute accesses to the appropriate subset of the last-level caches when, for example, some of the last-level caches are ‘off.’ The chip controls the power consumption by turning on and off cache slices based on power states, and consequently dynamically switches among at least two hash functions.
    Type: Grant
    Filed: September 12, 2016
    Date of Patent: June 11, 2019
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Patrick P. Lai, Robert Allen Shearer
  • Patent number: 10303603
    Abstract: A special class of loads and stores access a user-defined memory region where coherency and memory orders are only enforced at the coherent point. Coherent memory requests, which are limited to user-defined memory region, are dispatched to the common memory ordering buffer. Non-coherent memory requests (e.g., all other memory requests) can be routed via non-coherent lower level caches to the shared last level cache. By assigning a private, non-overlapping, address spaces to each of the processor cores, the lower-level caches do not need to implement the logic necessary to maintain cache coherency. This can reduce power consumption and integrated circuit die area. This can also improve memory bandwidth and performance for applications with predominantly non-coherent memory accesses while still providing memory coherence for specific memory range(s)/applications that demand it.
    Type: Grant
    Filed: June 13, 2017
    Date of Patent: May 28, 2019
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventor: Patrick P. Lai
  • Patent number: 10282298
    Abstract: A system that uses a write-invalidate protocol has two types of stores: a traditional store that operates using a write-back policy that snoops for copies of the cache line at lower cache levels, and a store that writes, using a coherent write-through policy, directly to the last-level cache without snooping the lower cache levels. A separate store buffer may be maintained in the processor for the coherent write-through operations. A special bit may be maintained in the entries of a store buffer that is used for both traditional write-back policy stores and for coherent write-through policy. This bit indicates that loads and stores older than the last speculative store in the store buffer are allowed to be performed.
    Type: Grant
    Filed: June 13, 2017
    Date of Patent: May 7, 2019
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventor: Patrick P. Lai
  • Patent number: 10241561
    Abstract: The hash function used by the processors on a multi-processor chip to distribute accesses to the various last-level caches via the links is changed according to which last-level caches (and/or links) that are active (e.g., ‘on’) and which are in a lower power consumption mode (e.g., ‘off’.) A first hash function is used to distribute accesses to all of the last-level caches and all of the links when all of the last-level caches are ‘on.’ A second hash function is used to distribute accesses to the appropriate subset of the last-level caches and corresponding subset of links when some of the last-level caches are ‘off.’ Data can be sent to only the active last-level caches via active links. By shutting off links connected to caches and components that are in a lower power consumption mode, the power consumption of the chip is reduced.
    Type: Grant
    Filed: June 13, 2017
    Date of Patent: March 26, 2019
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Patrick P. Lai, Robert Allen Shearer
  • Publication number: 20180357169
    Abstract: A special class of loads and stores access a user-defined memory region where coherency and memory orders are only enforced at the coherent point. Coherent memory requests, which are limited to user-defined memory region, are dispatched to the common memory ordering buffer. Non-coherent memory requests (e.g., all other memory requests) can be routed via non-coherent lower level caches to the shared last level cache. By assigning a private, non-overlapping, address spaces to each of the processor cores, the lower-level caches do not need to implement the logic necessary to maintain cache coherency. This can reduce power consumption and integrated circuit die area. This can also improve memory bandwidth and performance for applications with predominantly non-coherent memory accesses while still providing memory coherence for specific memory range(s)/applications that demand it.
    Type: Application
    Filed: June 13, 2017
    Publication date: December 13, 2018
    Inventor: Patrick P. LAI
  • Publication number: 20180357172
    Abstract: A system that uses a write-invalidate protocol has two types of stores: a traditional store that operates using a write-back policy that snoops for copies of the cache line at lower cache levels, and a store that writes, using a coherent write-through policy, directly to the last-level cache without snooping the lower cache levels. A separate store buffer may be maintained in the processor for the coherent write-through operations. A special bit may be maintained in the entries of a store buffer that is used for both traditional write-back policy stores and for coherent write-through policy. This bit indicates that loads and stores older than the last speculative store in the store buffer are allowed to be performed.
    Type: Application
    Filed: June 13, 2017
    Publication date: December 13, 2018
    Inventor: Patrick P. LAI
  • Publication number: 20180356874
    Abstract: The hash function used by the processors on a multi-processor chip to distribute accesses to the various last-level caches via the links is changed according to which last-level caches (and/or links) that are active (e.g., ‘on’) and which are in a lower power consumption mode (e.g., ‘off’.) A first hash function is used to distribute accesses to all of the last-level caches and all of the links when all of the last-level caches are ‘on.’ A second hash function is used to distribute accesses to the appropriate subset of the last-level caches and corresponding subset of links when some of the last-level caches are ‘off.’ Data can be sent to only the active last-level caches via active links. By shutting off links connected to caches and components that are in a lower power consumption mode, the power consumption of the chip is reduced.
    Type: Application
    Filed: June 13, 2017
    Publication date: December 13, 2018
    Inventors: Patrick P. LAI, Robert Allen SHEARER
  • Publication number: 20180349284
    Abstract: The lookup of accesses (including snoops) to cache tag ways is serialized to perform one (or less than all) tag way access per clock (or even slower). Thus, for a N-way set associative cache, instead of performing lookup/comparison on the N tag ways in parallel, the lookups are performed one tag way a time. Way prediction is utilized to select an order to look in the N ways. This can include selecting which tag way will be looked in first. This helps to reduce the average number of cycles and lookups required.
    Type: Application
    Filed: May 30, 2017
    Publication date: December 6, 2018
    Inventors: Patrick P. LAI, Robert Allen SHEARER
  • Publication number: 20180348847
    Abstract: Processors may include cache circuitry that is a significant source of power consumption. A cache is going to be placed into a lower power mode. Based at least in part on this anticipated transition, the contents of the cache data lines are copied into persistent storage. While the cache is in the lower power mode, the tag circuitry is kept operational. When an access request is made to the cache, a relatively fast lookup of the tag in the tag array can be made. The location where the associated cache line is stored in the persistent storage may be determined from the tag data. Upon a tag hit, the system is able to find the contents of the requested cache line in the persistent storage without returning the storage array of the cache to a fully operational state.
    Type: Application
    Filed: May 30, 2017
    Publication date: December 6, 2018
    Inventors: Patrick P. LAI, Robert Allen SHEARER
  • Publication number: 20180336143
    Abstract: A first cache is paired at the same cache level with a second, higher capacity, but slower, cache. Access to both caches is performed in parallel and whichever cache hits and returns the data first is considered a valid cache read-hit. The higher capacity cache is configured to have multiple power saving modes while also having a high level of associativity in order to minimize conflicts and capacity misses. Transfers can move cache lines between the two caches at the same level (i.e., without crossing a large inter-cache level or inter-processor fabric) in order to adapt to changing access patterns. This functionality allows a balancing/trade-off between access latency and power consumption.
    Type: Application
    Filed: May 22, 2017
    Publication date: November 22, 2018
    Inventors: Patrick P. LAI, Robert Allen SHEARER
  • Publication number: 20180210836
    Abstract: A multi-core processing chip where the last-level cache is implemented by multiple last-level caches (a.k.a. cache slices) that are physically and logically distributed. The various processors of the chip decide which last-level cache is to hold a given data block by applying a temperature or reliability dependent hash function to the physical address. While the system is running, a last-level cache that is overheating, or is being overused, is no longer used by changing the hash function. Before accesses to the overheating cache are prevented, the contents of that cache are migrated to other last-level caches per the changed hash function. When a core processor associated with a last-level cache is shut down, or processes/threads are removed from that core, or when the core is overheating, use of the associated last-level cache can be prevented by changing the hash function and the contents migrated to other caches.
    Type: Application
    Filed: January 24, 2017
    Publication date: July 26, 2018
    Inventors: Patrick P. Lai, Robert Allen Shearer
  • Publication number: 20180137054
    Abstract: A cache system is configurable to trade power consumption for cache access latency. When it is desired for a system with a cache to conserve dynamic power, the lookup of accesses (e.g., snoops) to cache tag ways is serialized to perform one (or less than all) tag way access per clock (or even slower). Thus, for an N-way set associative cache, instead of performing a lookup/comparison on the N tag ways in parallel, the lookups are performed one tag way at a time. This take N times more cycles thereby reducing the access/snoop bandwidth by a factor of N. However, the power consumption of the serialized access when compared to ‘all parallel’ accesses/snoops is reduced.
    Type: Application
    Filed: November 11, 2016
    Publication date: May 17, 2018
    Inventors: Patrick P. Lai, Robert Allen Shearer
  • Publication number: 20180074964
    Abstract: A multi-core processing chip where the last-level cache functionality is implemented by multiple last-level caches (a.k.a. cache slices) that are physically and logically distributed. The hash function used by the processors on the chip is changed according to which of last-level caches are active (e.g., ‘on’) and which are in a lower power consumption mode (e.g., ‘off’.) Thus, a first hash function is used to distribute accesses (i.e., reads and writes of data blocks) to all of the last-level caches when, for example, all of the last-level caches are ‘on.’ A second hash function is used to distribute accesses to the appropriate subset of the last-level caches when, for example, some of the last-level caches are ‘off.’ The chip controls the power consumption by turning on and off cache slices based on power states, and consequently dynamically switches among at least two hash functions.
    Type: Application
    Filed: September 12, 2016
    Publication date: March 15, 2018
    Inventors: Patrick P. Lai, Robert Allen Shearer