Patents by Inventor William C. Hasenplaugh

William C. Hasenplaugh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Processors, methods, systems, and instructions to load multiple data elements to destination storage locations other than packed data registers

Patent number: 11068264

Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.

Type: Grant

Filed: August 9, 2019

Date of Patent: July 20, 2021

Assignee: Intel Corporation

Inventors: William C. Hasenplaugh, Chris J. Newburn, Simon C. Steely, Jr., Samantika S. Sury
PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS TO LOAD MULTIPLE DATA ELEMENTS TO DESTINATION STORAGE LOCATIONS OTHER THAN PACKED DATA REGISTERS

Publication number: 20190384601

Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.

Type: Application

Filed: August 9, 2019

Publication date: December 19, 2019

Inventors: William C. Hasenplaugh, Chris J. Newburn, Simon C. Steely, JR., Samantika S. Sury
Low energy consumption mantissa multiplication for floating point multiply-add operations

Patent number: 10402168

Abstract: A floating point multiply-add unit having inputs coupled to receive a floating point multiplier data element, a floating point multiplicand data element, and a floating point addend data element. The multiply-add unit including a mantissa multiplier to multiply a mantissa of the multiplier data element and a mantissa of the multiplicand data element to calculate a mantissa product. The mantissa multiplier including a most significant bit portion to calculate most significant bits of the mantissa product, and a least significant bit portion to calculate least significant bits of the mantissa product. The mantissa multiplier has a plurality of different possible sizes of the least significant bit portion. Energy consumption reduction logic to selectively reduce energy consumption of the least significant bit portion, but not the most significant bit portion, to cause the least significant bit portion to not calculate the least significant bits of the mantissa product.

Type: Grant

Filed: October 1, 2016

Date of Patent: September 3, 2019

Assignee: Intel Corporation

Inventors: William C. Hasenplaugh, Kermin E. Fleming, Jr., Tryggve Fossum, Simon C. Steely, Jr.
Processors, methods, systems, and instructions to load multiple data elements to destination storage locations other than packed data registers

Patent number: 10379855

Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.

Type: Grant

Filed: September 30, 2016

Date of Patent: August 13, 2019

Assignee: Intel Corporation

Inventors: William C. Hasenplaugh, Chris J. Newburn, Simon C. Steely, Jr., Samantika S. Sury
High bandwidth full-block write commands

Patent number: 10102124

Abstract: A micro-architecture may provide a hardware and software of a high bandwidth write command. The micro-architecture may invoke a method to perform the high bandwidth write command. The method may comprise sending a write request from a requester to a record keeping structure. The write request may have a memory address of a memory that stores requested data. The method may further determine copies of the requested data being present in a distributed cache system outside the memory, sending invalidation requests to elements holding copies of the requested data in the distributed cache system, sending a notification to the requester to inform presence of copies of the requested data and sending a write response message after a latest value of the requested data and all invalidation acknowledgements have been received.

Type: Grant

Filed: December 28, 2011

Date of Patent: October 16, 2018

Assignee: Intel Corporation

Inventors: Simon C. Steely, Jr., William C. Hasenplaugh, Joel S. Emer, Samantika Subramaniam
LOW ENERGY CONSUMPTION MANTISSA MULTIPLICATION FOR FLOATING POINT MULTIPLY-ADD OPERATIONS

Publication number: 20180095728

Abstract: A floating point multiply-add unit having inputs coupled to receive a floating point multiplier data element, a floating point multiplicand data element, and a floating point addend data element. The multiply-add unit including a mantissa multiplier to multiply a mantissa of the multiplier data element and a mantissa of the multiplicand data element to calculate a mantissa product. The mantissa multiplier including a most significant bit portion to calculate most significant bits of the mantissa product, and a least significant bit portion to calculate least significant bits of the mantissa product. The mantissa multiplier has a plurality of different possible sizes of the least significant bit portion. Energy consumption reduction logic to selectively reduce energy consumption of the least significant bit portion, but not the most significant bit portion, to cause the least significant bit portion to not calculate the least significant bits of the mantissa product.

Type: Application

Filed: October 1, 2016

Publication date: April 5, 2018

Applicant: Intel Corporation

Inventors: William C. Hasenplaugh, Kermin E. Fleming, JR., Tryggve Fossum, Simon C. Steely, JR.
PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS TO LOAD MULTIPLE DATA ELEMENTS TO DESTINATION STORAGE LOCATIONS OTHER THAN PACKED DATA REGISTERS

Publication number: 20180095756

Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.

Type: Application

Filed: September 30, 2016

Publication date: April 5, 2018

Applicant: Intel Corporation

Inventors: William C. Hasenplaugh, Chris J. Newburn, Simon C. Steely, JR., Samantika S. Sury
Hardware apparatuses and methods to control cache line coherency

Patent number: 9934146

Abstract: Methods and apparatuses to control cache line coherency are described. A processor may include a first core having a cache to store a cache line, a second core to send a request for the cache line from the first core, moving logic to cause a move of the cache line between the first core and a memory and to update a tag directory of the move, and cache line coherency logic to create a chain home in the tag directory from the request to cause the cache line to be sent from the tag directory to the second core. A method to control cache line coherency may include creating a chain home in a tag directory from a request for a cache line in a first processor core from a second processor core to cause the cache line to be sent from the tag directory to the second processor core.

Type: Grant

Filed: September 26, 2014

Date of Patent: April 3, 2018

Assignee: INTEL CORPORATION

Inventors: Simon C. Steely, Jr., Samantika S. Sury, William C. Hasenplaugh
Multicast tree-based data distribution in distributed shared cache

Patent number: 9734069

Abstract: Systems and methods for multicast tree-based data distribution in a distributed shared cache. An example processing system comprises: a plurality of processing cores, each processing core communicatively coupled to a cache; a tag directory associated with caches of the plurality of processing cores; a shared cache associated with the tag directory; a processing logic configured, responsive to receiving an invalidate request with respect to a certain cache entry, to: allocate, within the shared cache, a shared cache entry corresponding to the certain cache entry; transmit, to at least one of: a tag directory or a processing core that last accessed the certain entry, an update read request with respect to the certain cache entry; and responsive to receiving an update of the certain cache entry, broadcast the update to at least one of: one or more tag directories or one or more processing cores identified by a tag corresponding to the certain cache entry.

Type: Grant

Filed: December 11, 2014

Date of Patent: August 15, 2017

Assignee: Intel Corporation

Inventors: Simon C. Steely, Jr., William C. Hasenplaugh, Samantika S. Sury
Address range priority mechanism

Patent number: 9727482

Abstract: Method and apparatus to efficiently manage data in caches. Data in caches may be managed based on priorities assigned to the data. Data may be requested by a process using a virtual address of the data. The requested data may be assigned a priority by a component in a computer system called an address range priority assigner (ARP). The ARP may assign a particular priority to the requested data if the virtual address of the requested data is within a particular range of virtual addresses. The particular priority assigned may be high priority and the particular range of virtual addresses may be smaller than a cache's capacity.

Type: Grant

Filed: September 26, 2016

Date of Patent: August 8, 2017

Assignee: Intel Corporation

Inventors: Simon Steely, Jr., Samantika S. Sury, William C. Hasenplaugh
Domain state

Patent number: 9588889

Abstract: Method and apparatus to efficiently maintain cache coherency by reading/writing a domain state field associated with a tag entry within a cache tag directory. A value may be assigned to a domain state field of a tag entry in a cache tag directory. The cache tag directory may belong to a hierarchy of cache tag directories. Each tag entry may be associated with a cache line from a cache belonging to a first domain. The first domain may contain multiple caches. The value of the domain state field may indicate whether its associated cache line can be read or changed.

Type: Grant

Filed: December 29, 2011

Date of Patent: March 7, 2017

Assignee: INTEL CORPORATION

Inventors: Simon C. Steely, Jr., William C. Hasenplaugh, Joel S. Emer
ADDRESS RANGE PRIORITY MECHANISM

Publication number: 20170010974

Abstract: Method and apparatus to efficiently manage data in caches. Data in caches may be managed based on priorities assigned to the data. Data may be requested by a process using a virtual address of the data. The requested data may be assigned a priority by a component in a computer system called an address range priority assigner (ARP). The ARP may assign a particular priority to the requested data if the virtual address of the requested data is within a particular range of virtual addresses. The particular priority assigned may be high priority and the particular range of virtual addresses may be smaller than a cache's capacity.

Type: Application

Filed: September 26, 2016

Publication date: January 12, 2017

Inventors: Simon Steely, JR., Samantika S. Sury, William C. Hasenplaugh
Address range priority mechanism

Patent number: 9477610

Abstract: Method and apparatus to efficiently manage data in caches. Data in caches may be managed based on priorities assigned to the data. Data may be requested by a process using a virtual address of the data. The requested data may be assigned a priority by a component in a computer system called an address range priority assigner (ARP). The ARP may assign a particular priority to the requested data if the virtual address of the requested data is within a particular range of virtual addresses. The particular priority assigned may be high priority and the particular range of virtual addresses may be smaller than a cache's capacity.

Type: Grant

Filed: December 23, 2011

Date of Patent: October 25, 2016

Assignee: Intel Corporation

Inventors: Simon Steely, Jr., Samantika Subramaniam, William C. Hasenplaugh
Method and apparatus for optimizing the usage of cache memories

Patent number: 9418016

Abstract: A method and apparatus to reduce unnecessary write backs of cached data to a main memory and to optimize the usage of a cache memory tag directory. In one embodiment of the invention, the power consumption of a processor can be saved by eliminating write backs of cache memory lines that has information that has reached its end-of-life. In one embodiment of the invention, when a processing unit is required to clear one or more cache memory lines, it uses a write-zero command to clear the one or more cache memory lines. The processing unit does not perform a write operation to move or pass data values of zero to the one or more cache memory lines. By doing so, it reduces the power consumption of the processing unit.

Type: Grant

Filed: December 21, 2010

Date of Patent: August 16, 2016

Assignee: Intel Corporation

Inventors: Simon C. Steely, Jr., Joel S. Emer, William C. Hasenplaugh
MULTICAST TREE-BASED DATA DISTRIBUTION IN DISTRIBUTED SHARED CACHE

Publication number: 20160170880

Abstract: Systems and methods for multicast tree-based data distribution in a distributed shared cache. An example processing system comprises: a plurality of processing cores, each processing core communicatively coupled to a cache; a tag directory associated with caches of the plurality of processing cores; a shared cache associated with the tag directory; a processing logic configured, responsive to receiving an invalidate request with respect to a certain cache entry, to: allocate, within the shared cache, a shared cache entry corresponding to the certain cache entry; transmit, to at least one of: a tag directory or a processing core that last accessed the certain entry, an update read request with respect to the certain cache entry; and responsive to receiving an update of the certain cache entry, broadcast the update to at least one of: one or more tag directories or one or more processing cores identified by a tag corresponding to the certain cache entry.

Type: Application

Filed: December 11, 2014

Publication date: June 16, 2016

Inventors: SIMON C. STEELY, JR., WILLIAM C. HASENPLAUGH, SAMANTIKA S. SURY
Hardware compilation and/or translation with fault detection and roll back functionality

Patent number: 9317263

Abstract: Hardware compilation and/or translation with fault detection and roll back functionality are disclosed. Compilation and/or translation logic receives programs encoded in one language, and encodes the programs into a second language including instructions to support processor features not encoded into the original language encoding of the programs. In one embodiment, an execution unit executes instructions of the second language including an operation-check instruction to perform a first operation and record the first operation result for a comparison, and an operation-test instruction to perform a second operation and a fault detection operation by comparing the second operation result to the recorded first operation result.

Type: Grant

Filed: October 14, 2014

Date of Patent: April 19, 2016

Assignee: Intel Corporation

Inventors: Nicholas Cheng Hwa Chee, Tryggve Fossum, William C. Hasenplaugh
HARDWARE APPARATUSES AND METHODS TO CONTROL CACHE LINE COHERENCY

Publication number: 20160092354

Abstract: Methods and apparatuses to control cache line coherency are described. A processor may include a first core having a cache to store a cache line, a second core to send a request for the cache line from the first core, moving logic to cause a move of the cache line between the first core and a memory and to update a tag directory of the move, and cache line coherency logic to create a chain home in the tag directory from the request to cause the cache line to be sent from the tag directory to the second core. A method to control cache line coherency may include creating a chain home in a tag directory from a request for a cache line in a first processor core from a second processor core to cause the cache line to be sent from the tag directory to the second processor core.

Type: Application

Filed: September 26, 2014

Publication date: March 31, 2016

Inventors: Simon C. Steely, JR., Samantika S. Sury, William C. Hasenplaugh
Scalable multi-layer 2D-mesh routers

Patent number: 9294419

Abstract: Architectures, apparatus and systems employing scalable multi-layer 2D-mesh routers. A 2D router mesh comprises bi-direction pairs of linked paths coupled between pairs of IO interfaces and configured in a plurality of rows and columns forming a 2D mesh. Router nodes are located at the intersections of the rows and columns, and are configured to forward data units between IO inputs and outputs coupled to the mesh at its edges through use of shortest path routes defined by agents at the IO interfaces. Multiple instances of the 2D meshes may be employed to support bandwidth scaling of the router architecture. One implementation of a multi-layer 2D mesh is built using a standard tile that is tessellated to form a 2D array of standard tiles, with each 2D mesh layer offset and overlaid relative to the other 2D mesh layers. IO interfaces are then coupled to the multi-layer 2D mesh via muxes/demuxes and/or crossbar interconnects.

Type: Grant

Filed: June 26, 2013

Date of Patent: March 22, 2016

Assignee: INTEL CORPORATION

Inventors: William C. Hasenplaugh, Tryggve Fossum, Judson S. Leonard
Signature based hit-predicting cache

Patent number: 9262327

Abstract: An apparatus may comprise a cache file having a plurality of cache lines and a hit predictor. The hit predictor may contain a table of counter values indexed with signatures that are associated with the plurality of cache lines. The apparatus may fill cache lines into the cache file with either low or high priority. Low priority lines may be chosen to be replaced by a replacement algorithm before high priority lines. In this way, the cache naturally may contain more high priority lines than low priority ones. This priority filling process may improve the performance of most replacement schemes including the best known schemes which are already doing better than LRU.

Type: Grant

Filed: June 29, 2012

Date of Patent: February 16, 2016

Assignee: Intel Corporation

Inventors: Simon C. Steely, Jr., William C. Hasenplaugh, Aamer Jaleel, Joel S. Emer, Carole-Jean Wu
Update mask for handling interaction between fills and updates

Patent number: 9251073

Abstract: A multi core processor implements a cash coherency protocol in which probe messages are address-ordered on a probe channel while responses are un-ordered on a response channel. When a first core generates a read of an address that misses in the first core's cache, a line fill is initiated. If a second core is writing the same address, the second core generates an update on the addressed ordered probe channel. The second core's update may arrive before or after the first core's line fill returns. If the update arrived before the fill returned, a mask is maintained to indicate which portions of the line were modified by the update so that the late arriving line fill only modifies portions of the line that were unaffected by the earlier-arriving update.

Type: Grant

Filed: December 31, 2012

Date of Patent: February 2, 2016

Assignee: Intel Corporation

Inventors: Simon C. Steely, William C. Hasenplaugh

1 2 3 4 next