Patents by Inventor PATRICK P. LAI

PATRICK P. LAI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Adaptive power down of intra-chip interconnect

Patent number: 10852810

Abstract: An integrated circuit comprising a plurality of last-level caches, a plurality of processor cores configured to access data in the plurality of last-level caches, and an interconnect network. The plurality of last-level caches can be placed in at least a high cache-power consumption mode and a low cache-power consumption mode. The plurality of last-level caches includes a first last-level cache and a second last-level cache. The interconnect network comprises a plurality of links that can be placed in at least a high link-power consumption mode and a low link-power consumption mode. The interconnect network is configured to cause a first subset of the plurality of links to be placed in the low link-power consumption mode based at least in part on the first last-level cache being in the low cache-power consumption mode.

Type: Grant

Filed: March 6, 2019

Date of Patent: December 1, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Patrick P. Lai, Robert Allen Shearer
Cache memory with reduced power consumption mode

Patent number: 10591978

Abstract: Processors may include cache circuitry that is a significant source of power consumption. A cache is going to be placed into a lower power mode. Based at least in part on this anticipated transition, the contents of the cache data lines are copied into persistent storage. While the cache is in the lower power mode, the tag circuitry is kept operational. When an access request is made to the cache, a relatively fast lookup of the tag in the tag array can be made. The location where the associated cache line is stored in the persistent storage may be determined from the tag data. Upon a tag hit, the system is able to find the contents of the requested cache line in the persistent storage without returning the storage array of the cache to a fully operational state.

Type: Grant

Filed: May 30, 2017

Date of Patent: March 17, 2020

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Patrick P. Lai, Robert Allen Shearer
Serial tag lookup with way-prediction

Patent number: 10565122

Abstract: The lookup of accesses (including snoops) to cache tag ways is serialized to perform one (or less than all) tag way access per clock (or even slower). Thus, for a N-way set associative cache, instead of performing lookup/comparison on the N tag ways in parallel, the lookups are performed one tag way a time. Way prediction is utilized to select an order to look in the N ways. This can include selecting which tag way will be looked in first. This helps to reduce the average number of cycles and lookups required.

Type: Grant

Filed: May 30, 2017

Date of Patent: February 18, 2020

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Patrick P. Lai, Robert Allen Shearer
Apparatus and method for efficiently implementing a processor pipeline

Patent number: 10409763

Abstract: Various different embodiments of the invention are described including: (1) a method and apparatus for intelligently allocating threads within a binary translation system; (2) data cache way prediction guided by binary translation code morphing software; (3) fast interpreter hardware support on the data-side; (4) out-of-order retirement; (5) decoupled load retirement in an atomic OOO processor; (6) handling transactional and atomic memory in an out-of-order binary translation based processor; and (7) speculative memory management in a binary translation based out of order processor.

Type: Grant

Filed: June 30, 2014

Date of Patent: September 10, 2019

Assignee: INTEL CORPORATION

Inventors: Patrick P. Lai, Ethan Schuchman, David Keppel, Denis M. Khartikov, Polychronis Xekalakis, Joshua B. Fryman, Allan D. Knies, Naveen Neelakantam, Gregor Stellpflug, John H. Kelm, Mirem Hyuseinova Seidahmedova, Demos Pavlou, Jaroslaw Topp
ADAPTIVE POWER DOWN OF INTRA-CHIP INTERCONNECT

Publication number: 20190204898

Abstract: An integrated circuit comprising a plurality of last-level caches, a plurality of processor cores configured to access data in the plurality of last-level caches, and an interconnect network. The plurality of last-level caches can be placed in at least a high cache-power consumption mode and a low cache-power consumption mode. The plurality of last-level caches includes a first last-level cache and a second last-level cache. The interconnect network comprises a plurality of links that can be placed in at least a high link-power consumption mode and a low link-power consumption mode. The interconnect network is configured to cause a first subset of the plurality of links to be placed in the low link-power consumption mode based at least in part on the first last-level cache being in the low cache-power consumption mode.

Type: Application

Filed: March 6, 2019

Publication date: July 4, 2019

Applicant: Microsoft Technology Licensing, LLC

Inventors: Patrick P. LAI, Robert Allen SHEARER
Serial lookup of tag ways

Patent number: 10324850

Abstract: A cache system is configurable to trade power consumption for cache access latency. When it is desired for a system with a cache to conserve dynamic power, the lookup of accesses (e.g., snoops) to cache tag ways is serialized to perform one (or less than all) tag way access per clock (or even slower). Thus, for an N-way set associative cache, instead of performing a lookup/comparison on the N tag ways in parallel, the lookups are performed one tag way at a time. This take N times more cycles thereby reducing the access/snoop bandwidth by a factor of N. However, the power consumption of the serialized access when compared to ‘all parallel’ accesses/snoops is reduced.

Type: Grant

Filed: November 11, 2016

Date of Patent: June 18, 2019

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Patrick P. Lai, Robert Allen Shearer
Hardware apparatuses and methods to fuse instructions

Patent number: 10324724

Abstract: Methods and apparatuses relating to a fusion manager to fuse instructions are described. In one embodiment, a hardware processor includes a hardware binary translator to translate an instruction stream into a translated instruction stream, a hardware fusion manager to fuse multiple instructions of the translated instruction stream into a single fused instruction, a hardware decode unit to decode the single fused instruction into a decoded, single fused instruction, and a hardware execution unit to execute the decoded, single fused instruction.

Type: Grant

Filed: December 16, 2015

Date of Patent: June 18, 2019

Assignee: Intel Corporation

Inventors: Patrick P. Lai, Tyler N. Sondag, Sebastian Winkel, Polychronis Xekalakis, Ethan Schuchman, Jayesh Iyer
Power aware hash function for cache memory mapping

Patent number: 10318428

Abstract: A multi-core processing chip where the last-level cache functionality is implemented by multiple last-level caches (a.k.a. cache slices) that are physically and logically distributed. The hash function used by the processors on the chip is changed according to which of last-level caches are active (e.g., ‘on’) and which are in a lower power consumption mode (e.g., ‘off’.) Thus, a first hash function is used to distribute accesses (i.e., reads and writes of data blocks) to all of the last-level caches when, for example, all of the last-level caches are ‘on.’ A second hash function is used to distribute accesses to the appropriate subset of the last-level caches when, for example, some of the last-level caches are ‘off.’ The chip controls the power consumption by turning on and off cache slices based on power states, and consequently dynamically switches among at least two hash functions.

Type: Grant

Filed: September 12, 2016

Date of Patent: June 11, 2019

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Patrick P. Lai, Robert Allen Shearer
Low power multi-core coherency

Patent number: 10303603

Abstract: A special class of loads and stores access a user-defined memory region where coherency and memory orders are only enforced at the coherent point. Coherent memory requests, which are limited to user-defined memory region, are dispatched to the common memory ordering buffer. Non-coherent memory requests (e.g., all other memory requests) can be routed via non-coherent lower level caches to the shared last level cache. By assigning a private, non-overlapping, address spaces to each of the processor cores, the lower-level caches do not need to implement the logic necessary to maintain cache coherency. This can reduce power consumption and integrated circuit die area. This can also improve memory bandwidth and performance for applications with predominantly non-coherent memory accesses while still providing memory coherence for specific memory range(s)/applications that demand it.

Type: Grant

Filed: June 13, 2017

Date of Patent: May 28, 2019

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor: Patrick P. Lai
Store buffer supporting direct stores to a coherence point

Patent number: 10282298

Abstract: A system that uses a write-invalidate protocol has two types of stores: a traditional store that operates using a write-back policy that snoops for copies of the cache line at lower cache levels, and a store that writes, using a coherent write-through policy, directly to the last-level cache without snooping the lower cache levels. A separate store buffer may be maintained in the processor for the coherent write-through operations. A special bit may be maintained in the entries of a store buffer that is used for both traditional write-back policy stores and for coherent write-through policy. This bit indicates that loads and stores older than the last speculative store in the store buffer are allowed to be performed.

Type: Grant

Filed: June 13, 2017

Date of Patent: May 7, 2019

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor: Patrick P. Lai
Adaptive power down of intra-chip interconnect

Patent number: 10241561

Abstract: The hash function used by the processors on a multi-processor chip to distribute accesses to the various last-level caches via the links is changed according to which last-level caches (and/or links) that are active (e.g., ‘on’) and which are in a lower power consumption mode (e.g., ‘off’.) A first hash function is used to distribute accesses to all of the last-level caches and all of the links when all of the last-level caches are ‘on.’ A second hash function is used to distribute accesses to the appropriate subset of the last-level caches and corresponding subset of links when some of the last-level caches are ‘off.’ Data can be sent to only the active last-level caches via active links. By shutting off links connected to caches and components that are in a lower power consumption mode, the power consumption of the chip is reduced.

Type: Grant

Filed: June 13, 2017

Date of Patent: March 26, 2019

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Patrick P. Lai, Robert Allen Shearer
LOW POWER MULTI-CORE COHERENCY

Publication number: 20180357169

Abstract: A special class of loads and stores access a user-defined memory region where coherency and memory orders are only enforced at the coherent point. Coherent memory requests, which are limited to user-defined memory region, are dispatched to the common memory ordering buffer. Non-coherent memory requests (e.g., all other memory requests) can be routed via non-coherent lower level caches to the shared last level cache. By assigning a private, non-overlapping, address spaces to each of the processor cores, the lower-level caches do not need to implement the logic necessary to maintain cache coherency. This can reduce power consumption and integrated circuit die area. This can also improve memory bandwidth and performance for applications with predominantly non-coherent memory accesses while still providing memory coherence for specific memory range(s)/applications that demand it.

Type: Application

Filed: June 13, 2017

Publication date: December 13, 2018

Inventor: Patrick P. LAI
STORE BUFFER SUPPORTING DIRECT STORES TO A COHERENCE POINT

Publication number: 20180357172

Abstract: A system that uses a write-invalidate protocol has two types of stores: a traditional store that operates using a write-back policy that snoops for copies of the cache line at lower cache levels, and a store that writes, using a coherent write-through policy, directly to the last-level cache without snooping the lower cache levels. A separate store buffer may be maintained in the processor for the coherent write-through operations. A special bit may be maintained in the entries of a store buffer that is used for both traditional write-back policy stores and for coherent write-through policy. This bit indicates that loads and stores older than the last speculative store in the store buffer are allowed to be performed.

Type: Application

Filed: June 13, 2017

Publication date: December 13, 2018

Inventor: Patrick P. LAI
ADAPTIVE POWER DOWN OF INTRA-CHIP INTERCONNECT

Publication number: 20180356874

Abstract: The hash function used by the processors on a multi-processor chip to distribute accesses to the various last-level caches via the links is changed according to which last-level caches (and/or links) that are active (e.g., ‘on’) and which are in a lower power consumption mode (e.g., ‘off’.) A first hash function is used to distribute accesses to all of the last-level caches and all of the links when all of the last-level caches are ‘on.’ A second hash function is used to distribute accesses to the appropriate subset of the last-level caches and corresponding subset of links when some of the last-level caches are ‘off.’ Data can be sent to only the active last-level caches via active links. By shutting off links connected to caches and components that are in a lower power consumption mode, the power consumption of the chip is reduced.

Type: Application

Filed: June 13, 2017

Publication date: December 13, 2018

Inventors: Patrick P. LAI, Robert Allen SHEARER
SERIAL TAG LOOKUP WITH WAY-PREDICTION

Publication number: 20180349284

Abstract: The lookup of accesses (including snoops) to cache tag ways is serialized to perform one (or less than all) tag way access per clock (or even slower). Thus, for a N-way set associative cache, instead of performing lookup/comparison on the N tag ways in parallel, the lookups are performed one tag way a time. Way prediction is utilized to select an order to look in the N ways. This can include selecting which tag way will be looked in first. This helps to reduce the average number of cycles and lookups required.

Type: Application

Filed: May 30, 2017

Publication date: December 6, 2018

Inventors: Patrick P. LAI, Robert Allen SHEARER
CACHE MEMORY WITH REDUCED POWER CONSUMPTION MODE

Publication number: 20180348847

Abstract: Processors may include cache circuitry that is a significant source of power consumption. A cache is going to be placed into a lower power mode. Based at least in part on this anticipated transition, the contents of the cache data lines are copied into persistent storage. While the cache is in the lower power mode, the tag circuitry is kept operational. When an access request is made to the cache, a relatively fast lookup of the tag in the tag array can be made. The location where the associated cache line is stored in the persistent storage may be determined from the tag data. Upon a tag hit, the system is able to find the contents of the requested cache line in the persistent storage without returning the storage array of the cache to a fully operational state.

Type: Application

Filed: May 30, 2017

Publication date: December 6, 2018

Inventors: Patrick P. LAI, Robert Allen SHEARER
CONCURRENT CACHE MEMORY ACCESS

Publication number: 20180336143

Abstract: A first cache is paired at the same cache level with a second, higher capacity, but slower, cache. Access to both caches is performed in parallel and whichever cache hits and returns the data first is considered a valid cache read-hit. The higher capacity cache is configured to have multiple power saving modes while also having a high level of associativity in order to minimize conflicts and capacity misses. Transfers can move cache lines between the two caches at the same level (i.e., without crossing a large inter-cache level or inter-processor fabric) in order to adapt to changing access patterns. This functionality allows a balancing/trade-off between access latency and power consumption.

Type: Application

Filed: May 22, 2017

Publication date: November 22, 2018

Inventors: Patrick P. LAI, Robert Allen SHEARER
THERMAL AND RELIABILITY BASED CACHE SLICE MIGRATION

Publication number: 20180210836

Abstract: A multi-core processing chip where the last-level cache is implemented by multiple last-level caches (a.k.a. cache slices) that are physically and logically distributed. The various processors of the chip decide which last-level cache is to hold a given data block by applying a temperature or reliability dependent hash function to the physical address. While the system is running, a last-level cache that is overheating, or is being overused, is no longer used by changing the hash function. Before accesses to the overheating cache are prevented, the contents of that cache are migrated to other last-level caches per the changed hash function. When a core processor associated with a last-level cache is shut down, or processes/threads are removed from that core, or when the core is overheating, use of the associated last-level cache can be prevented by changing the hash function and the contents migrated to other caches.

Type: Application

Filed: January 24, 2017

Publication date: July 26, 2018

Inventors: Patrick P. Lai, Robert Allen Shearer
SERIAL LOOKUP OF TAG WAYS

Publication number: 20180137054

Abstract: A cache system is configurable to trade power consumption for cache access latency. When it is desired for a system with a cache to conserve dynamic power, the lookup of accesses (e.g., snoops) to cache tag ways is serialized to perform one (or less than all) tag way access per clock (or even slower). Thus, for an N-way set associative cache, instead of performing a lookup/comparison on the N tag ways in parallel, the lookups are performed one tag way at a time. This take N times more cycles thereby reducing the access/snoop bandwidth by a factor of N. However, the power consumption of the serialized access when compared to ‘all parallel’ accesses/snoops is reduced.

Type: Application

Filed: November 11, 2016

Publication date: May 17, 2018

Inventors: Patrick P. Lai, Robert Allen Shearer
POWER AWARE HASH FUNCTION FOR CACHE MEMORY MAPPING

Publication number: 20180074964

Abstract: A multi-core processing chip where the last-level cache functionality is implemented by multiple last-level caches (a.k.a. cache slices) that are physically and logically distributed. The hash function used by the processors on the chip is changed according to which of last-level caches are active (e.g., ‘on’) and which are in a lower power consumption mode (e.g., ‘off’.) Thus, a first hash function is used to distribute accesses (i.e., reads and writes of data blocks) to all of the last-level caches when, for example, all of the last-level caches are ‘on.’ A second hash function is used to distribute accesses to the appropriate subset of the last-level caches when, for example, some of the last-level caches are ‘off.’ The chip controls the power consumption by turning on and off cache slices based on power states, and consequently dynamically switches among at least two hash functions.

Type: Application

Filed: September 12, 2016

Publication date: March 15, 2018

Inventors: Patrick P. Lai, Robert Allen Shearer

1 2 next