Patents by Inventor PATRICK P. LAI
PATRICK P. LAI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10852810Abstract: An integrated circuit comprising a plurality of last-level caches, a plurality of processor cores configured to access data in the plurality of last-level caches, and an interconnect network. The plurality of last-level caches can be placed in at least a high cache-power consumption mode and a low cache-power consumption mode. The plurality of last-level caches includes a first last-level cache and a second last-level cache. The interconnect network comprises a plurality of links that can be placed in at least a high link-power consumption mode and a low link-power consumption mode. The interconnect network is configured to cause a first subset of the plurality of links to be placed in the low link-power consumption mode based at least in part on the first last-level cache being in the low cache-power consumption mode.Type: GrantFiled: March 6, 2019Date of Patent: December 1, 2020Assignee: Microsoft Technology Licensing, LLCInventors: Patrick P. Lai, Robert Allen Shearer
-
Patent number: 10591978Abstract: Processors may include cache circuitry that is a significant source of power consumption. A cache is going to be placed into a lower power mode. Based at least in part on this anticipated transition, the contents of the cache data lines are copied into persistent storage. While the cache is in the lower power mode, the tag circuitry is kept operational. When an access request is made to the cache, a relatively fast lookup of the tag in the tag array can be made. The location where the associated cache line is stored in the persistent storage may be determined from the tag data. Upon a tag hit, the system is able to find the contents of the requested cache line in the persistent storage without returning the storage array of the cache to a fully operational state.Type: GrantFiled: May 30, 2017Date of Patent: March 17, 2020Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Patrick P. Lai, Robert Allen Shearer
-
Patent number: 10565122Abstract: The lookup of accesses (including snoops) to cache tag ways is serialized to perform one (or less than all) tag way access per clock (or even slower). Thus, for a N-way set associative cache, instead of performing lookup/comparison on the N tag ways in parallel, the lookups are performed one tag way a time. Way prediction is utilized to select an order to look in the N ways. This can include selecting which tag way will be looked in first. This helps to reduce the average number of cycles and lookups required.Type: GrantFiled: May 30, 2017Date of Patent: February 18, 2020Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Patrick P. Lai, Robert Allen Shearer
-
Patent number: 10409763Abstract: Various different embodiments of the invention are described including: (1) a method and apparatus for intelligently allocating threads within a binary translation system; (2) data cache way prediction guided by binary translation code morphing software; (3) fast interpreter hardware support on the data-side; (4) out-of-order retirement; (5) decoupled load retirement in an atomic OOO processor; (6) handling transactional and atomic memory in an out-of-order binary translation based processor; and (7) speculative memory management in a binary translation based out of order processor.Type: GrantFiled: June 30, 2014Date of Patent: September 10, 2019Assignee: INTEL CORPORATIONInventors: Patrick P. Lai, Ethan Schuchman, David Keppel, Denis M. Khartikov, Polychronis Xekalakis, Joshua B. Fryman, Allan D. Knies, Naveen Neelakantam, Gregor Stellpflug, John H. Kelm, Mirem Hyuseinova Seidahmedova, Demos Pavlou, Jaroslaw Topp
-
Publication number: 20190204898Abstract: An integrated circuit comprising a plurality of last-level caches, a plurality of processor cores configured to access data in the plurality of last-level caches, and an interconnect network. The plurality of last-level caches can be placed in at least a high cache-power consumption mode and a low cache-power consumption mode. The plurality of last-level caches includes a first last-level cache and a second last-level cache. The interconnect network comprises a plurality of links that can be placed in at least a high link-power consumption mode and a low link-power consumption mode. The interconnect network is configured to cause a first subset of the plurality of links to be placed in the low link-power consumption mode based at least in part on the first last-level cache being in the low cache-power consumption mode.Type: ApplicationFiled: March 6, 2019Publication date: July 4, 2019Applicant: Microsoft Technology Licensing, LLCInventors: Patrick P. LAI, Robert Allen SHEARER
-
Patent number: 10324850Abstract: A cache system is configurable to trade power consumption for cache access latency. When it is desired for a system with a cache to conserve dynamic power, the lookup of accesses (e.g., snoops) to cache tag ways is serialized to perform one (or less than all) tag way access per clock (or even slower). Thus, for an N-way set associative cache, instead of performing a lookup/comparison on the N tag ways in parallel, the lookups are performed one tag way at a time. This take N times more cycles thereby reducing the access/snoop bandwidth by a factor of N. However, the power consumption of the serialized access when compared to ‘all parallel’ accesses/snoops is reduced.Type: GrantFiled: November 11, 2016Date of Patent: June 18, 2019Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Patrick P. Lai, Robert Allen Shearer
-
Patent number: 10324724Abstract: Methods and apparatuses relating to a fusion manager to fuse instructions are described. In one embodiment, a hardware processor includes a hardware binary translator to translate an instruction stream into a translated instruction stream, a hardware fusion manager to fuse multiple instructions of the translated instruction stream into a single fused instruction, a hardware decode unit to decode the single fused instruction into a decoded, single fused instruction, and a hardware execution unit to execute the decoded, single fused instruction.Type: GrantFiled: December 16, 2015Date of Patent: June 18, 2019Assignee: Intel CorporationInventors: Patrick P. Lai, Tyler N. Sondag, Sebastian Winkel, Polychronis Xekalakis, Ethan Schuchman, Jayesh Iyer
-
Patent number: 10318428Abstract: A multi-core processing chip where the last-level cache functionality is implemented by multiple last-level caches (a.k.a. cache slices) that are physically and logically distributed. The hash function used by the processors on the chip is changed according to which of last-level caches are active (e.g., ‘on’) and which are in a lower power consumption mode (e.g., ‘off’.) Thus, a first hash function is used to distribute accesses (i.e., reads and writes of data blocks) to all of the last-level caches when, for example, all of the last-level caches are ‘on.’ A second hash function is used to distribute accesses to the appropriate subset of the last-level caches when, for example, some of the last-level caches are ‘off.’ The chip controls the power consumption by turning on and off cache slices based on power states, and consequently dynamically switches among at least two hash functions.Type: GrantFiled: September 12, 2016Date of Patent: June 11, 2019Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Patrick P. Lai, Robert Allen Shearer
-
Patent number: 10303603Abstract: A special class of loads and stores access a user-defined memory region where coherency and memory orders are only enforced at the coherent point. Coherent memory requests, which are limited to user-defined memory region, are dispatched to the common memory ordering buffer. Non-coherent memory requests (e.g., all other memory requests) can be routed via non-coherent lower level caches to the shared last level cache. By assigning a private, non-overlapping, address spaces to each of the processor cores, the lower-level caches do not need to implement the logic necessary to maintain cache coherency. This can reduce power consumption and integrated circuit die area. This can also improve memory bandwidth and performance for applications with predominantly non-coherent memory accesses while still providing memory coherence for specific memory range(s)/applications that demand it.Type: GrantFiled: June 13, 2017Date of Patent: May 28, 2019Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventor: Patrick P. Lai
-
Patent number: 10282298Abstract: A system that uses a write-invalidate protocol has two types of stores: a traditional store that operates using a write-back policy that snoops for copies of the cache line at lower cache levels, and a store that writes, using a coherent write-through policy, directly to the last-level cache without snooping the lower cache levels. A separate store buffer may be maintained in the processor for the coherent write-through operations. A special bit may be maintained in the entries of a store buffer that is used for both traditional write-back policy stores and for coherent write-through policy. This bit indicates that loads and stores older than the last speculative store in the store buffer are allowed to be performed.Type: GrantFiled: June 13, 2017Date of Patent: May 7, 2019Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventor: Patrick P. Lai
-
Patent number: 10241561Abstract: The hash function used by the processors on a multi-processor chip to distribute accesses to the various last-level caches via the links is changed according to which last-level caches (and/or links) that are active (e.g., ‘on’) and which are in a lower power consumption mode (e.g., ‘off’.) A first hash function is used to distribute accesses to all of the last-level caches and all of the links when all of the last-level caches are ‘on.’ A second hash function is used to distribute accesses to the appropriate subset of the last-level caches and corresponding subset of links when some of the last-level caches are ‘off.’ Data can be sent to only the active last-level caches via active links. By shutting off links connected to caches and components that are in a lower power consumption mode, the power consumption of the chip is reduced.Type: GrantFiled: June 13, 2017Date of Patent: March 26, 2019Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Patrick P. Lai, Robert Allen Shearer
-
Publication number: 20180357169Abstract: A special class of loads and stores access a user-defined memory region where coherency and memory orders are only enforced at the coherent point. Coherent memory requests, which are limited to user-defined memory region, are dispatched to the common memory ordering buffer. Non-coherent memory requests (e.g., all other memory requests) can be routed via non-coherent lower level caches to the shared last level cache. By assigning a private, non-overlapping, address spaces to each of the processor cores, the lower-level caches do not need to implement the logic necessary to maintain cache coherency. This can reduce power consumption and integrated circuit die area. This can also improve memory bandwidth and performance for applications with predominantly non-coherent memory accesses while still providing memory coherence for specific memory range(s)/applications that demand it.Type: ApplicationFiled: June 13, 2017Publication date: December 13, 2018Inventor: Patrick P. LAI
-
Publication number: 20180357172Abstract: A system that uses a write-invalidate protocol has two types of stores: a traditional store that operates using a write-back policy that snoops for copies of the cache line at lower cache levels, and a store that writes, using a coherent write-through policy, directly to the last-level cache without snooping the lower cache levels. A separate store buffer may be maintained in the processor for the coherent write-through operations. A special bit may be maintained in the entries of a store buffer that is used for both traditional write-back policy stores and for coherent write-through policy. This bit indicates that loads and stores older than the last speculative store in the store buffer are allowed to be performed.Type: ApplicationFiled: June 13, 2017Publication date: December 13, 2018Inventor: Patrick P. LAI
-
Publication number: 20180356874Abstract: The hash function used by the processors on a multi-processor chip to distribute accesses to the various last-level caches via the links is changed according to which last-level caches (and/or links) that are active (e.g., ‘on’) and which are in a lower power consumption mode (e.g., ‘off’.) A first hash function is used to distribute accesses to all of the last-level caches and all of the links when all of the last-level caches are ‘on.’ A second hash function is used to distribute accesses to the appropriate subset of the last-level caches and corresponding subset of links when some of the last-level caches are ‘off.’ Data can be sent to only the active last-level caches via active links. By shutting off links connected to caches and components that are in a lower power consumption mode, the power consumption of the chip is reduced.Type: ApplicationFiled: June 13, 2017Publication date: December 13, 2018Inventors: Patrick P. LAI, Robert Allen SHEARER
-
Publication number: 20180349284Abstract: The lookup of accesses (including snoops) to cache tag ways is serialized to perform one (or less than all) tag way access per clock (or even slower). Thus, for a N-way set associative cache, instead of performing lookup/comparison on the N tag ways in parallel, the lookups are performed one tag way a time. Way prediction is utilized to select an order to look in the N ways. This can include selecting which tag way will be looked in first. This helps to reduce the average number of cycles and lookups required.Type: ApplicationFiled: May 30, 2017Publication date: December 6, 2018Inventors: Patrick P. LAI, Robert Allen SHEARER
-
Publication number: 20180348847Abstract: Processors may include cache circuitry that is a significant source of power consumption. A cache is going to be placed into a lower power mode. Based at least in part on this anticipated transition, the contents of the cache data lines are copied into persistent storage. While the cache is in the lower power mode, the tag circuitry is kept operational. When an access request is made to the cache, a relatively fast lookup of the tag in the tag array can be made. The location where the associated cache line is stored in the persistent storage may be determined from the tag data. Upon a tag hit, the system is able to find the contents of the requested cache line in the persistent storage without returning the storage array of the cache to a fully operational state.Type: ApplicationFiled: May 30, 2017Publication date: December 6, 2018Inventors: Patrick P. LAI, Robert Allen SHEARER
-
Publication number: 20180336143Abstract: A first cache is paired at the same cache level with a second, higher capacity, but slower, cache. Access to both caches is performed in parallel and whichever cache hits and returns the data first is considered a valid cache read-hit. The higher capacity cache is configured to have multiple power saving modes while also having a high level of associativity in order to minimize conflicts and capacity misses. Transfers can move cache lines between the two caches at the same level (i.e., without crossing a large inter-cache level or inter-processor fabric) in order to adapt to changing access patterns. This functionality allows a balancing/trade-off between access latency and power consumption.Type: ApplicationFiled: May 22, 2017Publication date: November 22, 2018Inventors: Patrick P. LAI, Robert Allen SHEARER
-
Publication number: 20180210836Abstract: A multi-core processing chip where the last-level cache is implemented by multiple last-level caches (a.k.a. cache slices) that are physically and logically distributed. The various processors of the chip decide which last-level cache is to hold a given data block by applying a temperature or reliability dependent hash function to the physical address. While the system is running, a last-level cache that is overheating, or is being overused, is no longer used by changing the hash function. Before accesses to the overheating cache are prevented, the contents of that cache are migrated to other last-level caches per the changed hash function. When a core processor associated with a last-level cache is shut down, or processes/threads are removed from that core, or when the core is overheating, use of the associated last-level cache can be prevented by changing the hash function and the contents migrated to other caches.Type: ApplicationFiled: January 24, 2017Publication date: July 26, 2018Inventors: Patrick P. Lai, Robert Allen Shearer
-
Publication number: 20180137054Abstract: A cache system is configurable to trade power consumption for cache access latency. When it is desired for a system with a cache to conserve dynamic power, the lookup of accesses (e.g., snoops) to cache tag ways is serialized to perform one (or less than all) tag way access per clock (or even slower). Thus, for an N-way set associative cache, instead of performing a lookup/comparison on the N tag ways in parallel, the lookups are performed one tag way at a time. This take N times more cycles thereby reducing the access/snoop bandwidth by a factor of N. However, the power consumption of the serialized access when compared to ‘all parallel’ accesses/snoops is reduced.Type: ApplicationFiled: November 11, 2016Publication date: May 17, 2018Inventors: Patrick P. Lai, Robert Allen Shearer
-
Publication number: 20180074964Abstract: A multi-core processing chip where the last-level cache functionality is implemented by multiple last-level caches (a.k.a. cache slices) that are physically and logically distributed. The hash function used by the processors on the chip is changed according to which of last-level caches are active (e.g., ‘on’) and which are in a lower power consumption mode (e.g., ‘off’.) Thus, a first hash function is used to distribute accesses (i.e., reads and writes of data blocks) to all of the last-level caches when, for example, all of the last-level caches are ‘on.’ A second hash function is used to distribute accesses to the appropriate subset of the last-level caches when, for example, some of the last-level caches are ‘off.’ The chip controls the power consumption by turning on and off cache slices based on power states, and consequently dynamically switches among at least two hash functions.Type: ApplicationFiled: September 12, 2016Publication date: March 15, 2018Inventors: Patrick P. Lai, Robert Allen Shearer