Patents by Inventor Simon C. Steely

Simon C. Steely has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Cache coherency mechanism using arbitration masks

Patent number: 6961825

Abstract: A distributed processing system includes a cache coherency mechanism that essentially encodes network routing information into sectored presence bits. The mechanism organizes the sectored presence bits as one or more arbitration masks that system switches decode and use directly to route invalidate messages through one or more higher levels of the system. The lower level or levels of the system use local routing mechanisms, such as local directories, to direct the invalidate messages to the individual processors that are holding the data of interest.

Type: Grant

Filed: January 24, 2001

Date of Patent: November 1, 2005

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Simon C. Steely, Jr., Stephen Van Doren, Madhumitra Sharma
Low latency inter-reference ordering in a multiple processor system employing a multiple-level inter-node switch

Patent number: 6904465

Abstract: A multiple-processor system in which a commit message is returned to a source processor that requests a memory access operation so as to indicate the apparent completion of the operation includes a multiple-level switch unit linking nodes that contain the processors. The switch unit includes multiple input switches each of which receives messages from multiple nodes, and a set of output switches whose inputs are the outputs of the input switches and whose outputs are the inputs of the nodes. Each switch processes messages in the order in which they are received by the switch and each output switch follows the same rule as the other output switches.

Type: Grant

Filed: April 26, 2001

Date of Patent: June 7, 2005

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Simon C. Steely, Jr., Madhumitra Sharma, Stephen R. Van Doren
Livelock prevention by delaying surrender of ownership upon intervening ownership request during load locked / store conditional atomic memory operation

Patent number: 6801986

Abstract: A method, for executing a load locked and a store conditional instruction in a processor, achieves an atomic read-write operation to a memory block. First the load locked instruction is executed to read a memory block, and the processor in response to executing the load locked instruction issues a read modify system command to read the block and to take ownership of the block by the processor, and also sets a lock flag for the address of the memory block, and writes a value of the memory block into a cache of the processor as a cache copy of the memory block. The lock flag, upon receipt of an invalidate message by the processor for the cache copy of the memory block, is reset if any invalidate messages for the memory block are received by the processor. The processor waits for a selected time interval before the processor surrenders ownership of the memory block upon receipt of an ownership request message, if any is received by the processor after execution of the load locked instruction.

Type: Grant

Filed: August 20, 2001

Date of Patent: October 5, 2004

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Simon C. Steely, Jr., Stephen R. Van Doren, Madhumitra Sharma
System and method for determining operand access to data

Patent number: 6769057

Abstract: A “data verified”, or DV, bit is included in an instruction to indicate if the instruction or a dependent instruction may be associated with the retrieved data as soon as the data is available or should instead be associated with the data after verification. If the DV bit is in a first state, e.g., not set, the system may issue instructions that use associated data as soon as the data is available. If the DV bit is in a second state, e.g., set, the system does not issue the instructions that use the data until the data is verified. The system or user sets the DV bit based on an analysis of an instruction set that includes the instruction and/or accumulated profile data from previous use or uses of the software. The DV bit is set in a LOAD instruction if the dependent user instruction is close enough in the instruction set that the user instruction is likely to issue before the data is verified and/or if the LOAD instruction is part of a relatively long chain of instructions.

Type: Grant

Filed: January 22, 2001

Date of Patent: July 27, 2004

Assignee: Hewlett-Packard Development Company, L.P.

Inventor: Simon C. Steely, Jr.
Method and apparatus for adaptively bypassing one or more levels of a cache hierarchy

Patent number: 6647466

Abstract: A system for adaptively bypassing one or more higher cache levels following a miss in a lower level of a cache hierarchy is described. Each cache level preferably includes a tag store containing address and state information for each cache line resident in the respective cache. When an invalidate request is received at a given cache hierarchy, each cache level is searched for the address specified by the invalidate request. When an address match is detected, the state of the respective cache line is changed to the invalid state, although the address of the cache line is left in the tag store. Thereafter, if the processor or entity associated with this cache hierarchy issues its own request for this same cache line, the cache hierarchy begins searching the tag store of each level starting with the lowest cache level.

Type: Grant

Filed: January 25, 2001

Date of Patent: November 11, 2003

Assignee: Hewlett-Packard Development Company, L.P.

Inventor: Simon C. Steely, Jr.
Method and system for a processor to gain assured ownership of an up-to-date copy of data

Patent number: 6636948

Abstract: A performance enhancing change-to-dirty operation (CTD) is disclosed wherein contention among several processors trying to gain ownership of a block of data is obviated by arranging the CTD to always succeed. A method and a system are disclosed where a processor in a multiprocessor system having a copy of data gains assured ownership of data that the processor may then write. The method provides for the possibilities of conditions that may exist and provides a scenario that the requesting processor may have to wait for the ownership. Conditions are handled where the memory is the “owner” of the data and where other processor are requesting ownership, and where copies of the data exist at other processors. The method provides for messages to other processor having copies of the data informing them that the data is now invalid.

Type: Grant

Filed: April 13, 2001

Date of Patent: October 21, 2003

Assignee: Hewlett-Packard Development Company, LP.

Inventors: Simon C. Steely, Jr., Stephen R. Van Doren, Madhu Sharna
Mechanism for packet component merging and channel assignment, and packet decomposition and channel reassignment in a multiprocessor system

Publication number: 20030076831

Abstract: A technique efficiently combines data and ordered transactions in a multiprocessor system having a plurality of nodes interconnected by a hierarchical switch. The technique further enables an ordered channel of the system to make progress in the presence of a blocked interface within the hierarchical switch. Specifically, the technique combines ordered components and unordered data components into common packets that are transmitted over an ordered channel of the system in the event that ordered and unordered components are generated simultaneously. The technique further allows, in the event that a combined packet in the ordered channel is stalled due to a data buffer dependency, the packet to be decomposed into an ordered component and an unordered data component wherein the ordered component remains in the ordered channel and the unordered data component is reassigned to the unordered data channel.

Type: Application

Filed: March 21, 2001

Publication date: April 24, 2003

Inventors: Stephen R. Van Doren, Simon C. Steely, Madhumitra Sharma
Apparatus and method for ownership load locked misses for atomic lock acquisition in a multiprocessor computer system

Publication number: 20030037223

Abstract: A method, for executing a load locked and a store conditional instruction in a processor, achieves an atomic read-write operation to a memory block. First the load locked instruction is executed to read a memory block, and the processor in response to executing the load locked instruction issues a read modify system command to read the block and to take ownership of the block by the processor, and also sets a lock flag for the address of the memory block, and writes a value of the memory block into a cache of the processor as a cache copy of the memory block. The lock flag, upon receipt of an invalidate message by the processor for the cache copy of the memory block, is reset if any invalidate messages for the memory block are received by the processor. The processor waits for a selected time interval before the processor surrenders ownership of the memory block upon receipt of an ownership request message, if any is received by the processor after execution of the load locked instruction.

Type: Application

Filed: August 20, 2001

Publication date: February 20, 2003

Inventors: Simon C. Steely, Stephen R. Van Doren, Madhumitra Sharma
Low latency inter-reference ordering in a multiple processor system employing a multiple-level inter-node switch

Publication number: 20020194290

Abstract: A multiple-processor system in which a commit message is returned to a source processor that requests a memory access operation so as to indicate the apparent completion of the operation includes a multiple-level switch unit linking nodes that contain the processors. The switch unit includes multiple input switches each of which receives messages from multiple nodes, and a set of output switches whose inputs are the outputs of the input switches and whose outputs are the inputs of the nodes. Each switch processes messages in the order in which they are received by the switch and each output switch follows the same rule as the other output switches.

Type: Application

Filed: April 26, 2001

Publication date: December 19, 2002

Inventors: Simon C. Steely, Madhumitra Sharma, Stephen R. Van Doren
Adaptive dirty-block purging

Patent number: 6493801

Abstract: An adaptive cache coherent purging protocol includes recognizing system performance, especially latency, is affected by when cache is purged. The occurrence of performance enhancing and degrading events regarding a cache are counted and compared to a threshold. When the threshold is triggered the cache becomes a candidate for purging. In an embodiment, a time out delay is implemented before actual purging occurs. When the threshold is not triggered but a cache event occurs, a fake time out delay is triggered and the count is adaptively either raised, lowered or set to zero in response to performance enhancing and/or degrading events. The effect is to make the actual purging more likely if the history of cache events indicates that the performance would be enhanced thereby or less likely if the history indicates that the performance would be degraded thereby.

Type: Grant

Filed: January 26, 2001

Date of Patent: December 10, 2002

Assignee: Compaq Computer Corporation

Inventors: Simon C. Steely, Jr., Nikolaos Hardavellas
Always succeeding change to dirty method

Publication number: 20020152358

Abstract: A performance enhancing change-to-dirty operation (CTD) is disclosed wherein contention among several processors trying to gain ownership of a block of data is obviated by arranging the CTD to always succeed. A method and a system are disclosed where a processor in a multiprocessor system having a copy of data gains assured ownership of data that the processor may then write. The method provides for the possibilities of conditions that may exist and provides a scenario that the requesting processor may have to wait for the ownership. Conditions are handled where the memory is the “owner” of the data and where other processor are requesting ownership, and where copies of the data exist at other processors. The method provides for messages to other processor having copies of the data informing them that the data is now invalid.

Type: Application

Filed: April 13, 2001

Publication date: October 17, 2002

Inventors: Simon C. Steely, Stephen R. Van Doren, Madhumitra Sharma
Credit-based flow control technique in a modular multiprocessor system

Publication number: 20020146022

Abstract: A credit-based, flow control technique utilizes a plurality of counters to conserve resources of a switch fabric within a modular multiprocessor system while ensuring that transaction packets pending in virtual channel queues of the fabric efficiently progress through those resources. The multiprocessor system includes a plurality of nodes interconnected by the switch fabric that extends from a global input port of a node through a hierarchical switch to a global output port of the same or another node. The resources include shared buffers within the global ports and hierarchical switch. Each counter is associated with a virtual channel queue and the flow control technique uses the counters to essentially create the structure of the shared buffers.

Type: Application

Filed: April 9, 2001

Publication date: October 10, 2002

Inventors: Stephen R. Van Doren, Simon C. Steely, Madhumitra Sharma, Gregory E. Tierney
Adaptive dirty-block purging

Publication number: 20020103976

Abstract: An adaptive cache coherent purging protocol includes recognizing system performance, especially latency, is affected by when cache is purged. The occurrence of performance enhancing and degrading events regarding a cache are counted and compared to a threshold. When the threshold is triggered the cache becomes a candidate for purging. In an embodiment, a time out delay is implemented before actual purging occurs. When the threshold is not triggered but a cache event occurs, a fake time out delay is triggered and the count is adaptively either raised, lowered or set to zero in response to performance enhancing and/or degrading events. The effect is to make the actual purging more likely if the history of cache events indicates that the performance would be enhanced thereby or less likely if the history indicates that the performance would be degraded thereby.

Type: Application

Filed: January 26, 2001

Publication date: August 1, 2002

Inventors: Simon C. Steely, Nikolaos Hardavellas
Method and apparatus for adaptively bypassing one or more levels of a cache hierarchy

Publication number: 20020099913

Abstract: A system for adaptively bypassing one or more higher cache levels following a miss in a lower level of a cache hierarchy is described. Each cache level preferably includes a tag store containing address and state information for each cache line resident in the respective cache. When an invalidate request is received at a given cache hierarchy, each cache level is searched for the address specified by the invalidate request. When an address match is detected, the state of the respective cache line is changed to the invalid state, although the address of the cache line is left in the tag store. Thereafter, if the processor or entity associated with this cache hierarchy issues its own request for this same cache line, the cache hierarchy begins searching the tag store of each level starting with the lowest cache level.

Type: Application

Filed: January 25, 2001

Publication date: July 25, 2002

Inventor: Simon C. Steely
System and method for determining operand access to data

Publication number: 20020099927

Abstract: A “data verified”, or DV, bit is included in an instruction to indicate if the instruction or a dependent instruction may be associated with the retrieved data as soon as the data is available or should instead be associated with the data after verification. If the DV bit is in a first state, e.g., not set, the system may issue instructions that use associated data as soon as the data is available. If the DV bit is in a second state, e.g., set, the system does not issue the instructions that use the data until the data is verified. The system or user sets the DV bit based on an analysis of an instruction set that includes the instruction and/or accumulated profile data from previous use or uses of the software. The DV bit is set in a LOAD instruction if the dependent user instruction is close enough in the instruction set that the user instruction is likely to issue before the data is verified and/or if the LOAD instruction is part of a relatively long chain of instructions.

Type: Application

Filed: January 22, 2001

Publication date: July 25, 2002

Inventor: Simon C. Steely
Cache coherency mechanism using arbitration masks

Publication number: 20020099833

Abstract: A distributed processing system includes a cache coherency mechanism that essentially encodes network routing information into sectored presence bits. The mechanism organizes the sectored presence bits as one or more arbitration masks that system switches decode and use directly to route invalidate messages through one or more higher levels of the system. The lower level or levels of the system use local routing mechanisms, such as local directories, to direct the invalidate messages to the individual processors that are holding the data of interest.

Type: Application

Filed: January 24, 2001

Publication date: July 25, 2002

Inventors: Simon C. Steely, Stephen Van Doren, Madhumitra Sharma
Multicast decomposition mechanism in a hierarchically order distributed shared memory multiprocessor computer system

Publication number: 20020009095

Abstract: A technique decomposes a multicast transaction issued by one of a plurality of nodes of a distributed shared memory multiprocessor system into a series of multicast packets, each of which may further “spawn” multicast messages directed to a subset of the nodes. A central switch fabric interconnects the nodes, each of which includes a global port coupled to the switch, a plurality of processors and memory. The central switch includes a central ordering point that maintains an order of packets issued by, e.g., a source processor of a remote node when requesting data resident in a memory of a home node. The multicast messages spawned from a multicast packet passing the central ordering point are generated according to multicast decomposition and ordering rules of the inventive technique.

Type: Application

Filed: May 31, 2001

Publication date: January 24, 2002

Inventors: Stephen R. Van Doren, Simon C. Steely, Madhumitra Sharma
Initiate flow control mechanism of a modular multiprocessor system

Publication number: 20010055277

Abstract: An initiate flow control mechanism prevents interconnect resources within a switch fabric of a modular multiprocessor system from being dominated with initiate transactions. The multiprocessor system comprises a plurality of nodes interconnected by a switch fabric that extends from a global input port of a node through a hierarchical switch to a global output port of the same or another node. The interconnect resources include shared buffers within the global ports and hierarchical switch. The initiate flow control mechanism manages these shared buffers to reserve bandwidth for complete transactions when extensive global initiate traffic to one or more nodes of the system may create a bottleneck in the switch fabric.

Type: Application

Filed: May 11, 2001

Publication date: December 27, 2001

Inventors: Simon C. Steely, Madhumitra Sharma, Stephen R. Van Doren, Gregory E. Tierney
Low order channel flow control for an interleaved multiblock resource

Publication number: 20010049742

Abstract: A flow control technique prevents overflow of a write storage structure, such as a first-in, first-out (FIFO) queue, in a centralized Duplicate Tag store arrangement of a multiprocessor system that includes a plurality of nodes interconnected by a central switch. Each node comprises a plurality of processors with associated caches and memories interconnected by a local switch. Each node further comprises a Duplicate Tag (DTAG) store that contains information about the state of data relative to all processors of a node. The DTAG comprises the write FIFO which has a limited number of entries. Flow control logic in the local switch keeps track of when those entries may be occupied to avoid overflowing the FIFO.

Type: Application

Filed: May 29, 2001

Publication date: December 6, 2001

Inventors: Simon C. Steely, Hari Krishan Nagpal, Stephen R. Van Doren
High-performance communication method and apparatus for write-only networks

Patent number: 6295585

Abstract: A multi-node computer network includes a plurality of nodes coupled together via a data link. Each of the nodes includes a local memory, which further comprises a shared memory. Certain items of data that are to be shared by the nodes are stored in the shared portion of memory. Associated with each of the shared data items is a data structure. When a node sharing data with other nodes in the system seeks to modify the data, it transmits the modifications over the data link to the other nodes in the network. Each update is received in order by each node in the cluster. As part of the last transmission by the modifying node, an acknowledgement request is sent to the receiving nodes in the cluster. Each node that receives the acknowledgment request returns an acknowledgement to the sending node. The returned acknowledgement is written to the data structure associated with the shared data item.

Type: Grant

Filed: June 7, 1995

Date of Patent: September 25, 2001

Assignee: Compaq Computer Corporation

Inventors: Richard B. Gillett, Jr., Glenn P. Garvey, Simon C. Steely, Jr.

prev … 4 5 6 7 8 9 10 next