Patents by Inventor Simon C. Steely

Simon C. Steely has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 6961825
    Abstract: A distributed processing system includes a cache coherency mechanism that essentially encodes network routing information into sectored presence bits. The mechanism organizes the sectored presence bits as one or more arbitration masks that system switches decode and use directly to route invalidate messages through one or more higher levels of the system. The lower level or levels of the system use local routing mechanisms, such as local directories, to direct the invalidate messages to the individual processors that are holding the data of interest.
    Type: Grant
    Filed: January 24, 2001
    Date of Patent: November 1, 2005
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Simon C. Steely, Jr., Stephen Van Doren, Madhumitra Sharma
  • Patent number: 6904465
    Abstract: A multiple-processor system in which a commit message is returned to a source processor that requests a memory access operation so as to indicate the apparent completion of the operation includes a multiple-level switch unit linking nodes that contain the processors. The switch unit includes multiple input switches each of which receives messages from multiple nodes, and a set of output switches whose inputs are the outputs of the input switches and whose outputs are the inputs of the nodes. Each switch processes messages in the order in which they are received by the switch and each output switch follows the same rule as the other output switches.
    Type: Grant
    Filed: April 26, 2001
    Date of Patent: June 7, 2005
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Simon C. Steely, Jr., Madhumitra Sharma, Stephen R. Van Doren
  • Patent number: 6801986
    Abstract: A method, for executing a load locked and a store conditional instruction in a processor, achieves an atomic read-write operation to a memory block. First the load locked instruction is executed to read a memory block, and the processor in response to executing the load locked instruction issues a read modify system command to read the block and to take ownership of the block by the processor, and also sets a lock flag for the address of the memory block, and writes a value of the memory block into a cache of the processor as a cache copy of the memory block. The lock flag, upon receipt of an invalidate message by the processor for the cache copy of the memory block, is reset if any invalidate messages for the memory block are received by the processor. The processor waits for a selected time interval before the processor surrenders ownership of the memory block upon receipt of an ownership request message, if any is received by the processor after execution of the load locked instruction.
    Type: Grant
    Filed: August 20, 2001
    Date of Patent: October 5, 2004
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Simon C. Steely, Jr., Stephen R. Van Doren, Madhumitra Sharma
  • Patent number: 6769057
    Abstract: A “data verified”, or DV, bit is included in an instruction to indicate if the instruction or a dependent instruction may be associated with the retrieved data as soon as the data is available or should instead be associated with the data after verification. If the DV bit is in a first state, e.g., not set, the system may issue instructions that use associated data as soon as the data is available. If the DV bit is in a second state, e.g., set, the system does not issue the instructions that use the data until the data is verified. The system or user sets the DV bit based on an analysis of an instruction set that includes the instruction and/or accumulated profile data from previous use or uses of the software. The DV bit is set in a LOAD instruction if the dependent user instruction is close enough in the instruction set that the user instruction is likely to issue before the data is verified and/or if the LOAD instruction is part of a relatively long chain of instructions.
    Type: Grant
    Filed: January 22, 2001
    Date of Patent: July 27, 2004
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Simon C. Steely, Jr.
  • Patent number: 6647466
    Abstract: A system for adaptively bypassing one or more higher cache levels following a miss in a lower level of a cache hierarchy is described. Each cache level preferably includes a tag store containing address and state information for each cache line resident in the respective cache. When an invalidate request is received at a given cache hierarchy, each cache level is searched for the address specified by the invalidate request. When an address match is detected, the state of the respective cache line is changed to the invalid state, although the address of the cache line is left in the tag store. Thereafter, if the processor or entity associated with this cache hierarchy issues its own request for this same cache line, the cache hierarchy begins searching the tag store of each level starting with the lowest cache level.
    Type: Grant
    Filed: January 25, 2001
    Date of Patent: November 11, 2003
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Simon C. Steely, Jr.
  • Patent number: 6636948
    Abstract: A performance enhancing change-to-dirty operation (CTD) is disclosed wherein contention among several processors trying to gain ownership of a block of data is obviated by arranging the CTD to always succeed. A method and a system are disclosed where a processor in a multiprocessor system having a copy of data gains assured ownership of data that the processor may then write. The method provides for the possibilities of conditions that may exist and provides a scenario that the requesting processor may have to wait for the ownership. Conditions are handled where the memory is the “owner” of the data and where other processor are requesting ownership, and where copies of the data exist at other processors. The method provides for messages to other processor having copies of the data informing them that the data is now invalid.
    Type: Grant
    Filed: April 13, 2001
    Date of Patent: October 21, 2003
    Assignee: Hewlett-Packard Development Company, LP.
    Inventors: Simon C. Steely, Jr., Stephen R. Van Doren, Madhu Sharna
  • Publication number: 20030076831
    Abstract: A technique efficiently combines data and ordered transactions in a multiprocessor system having a plurality of nodes interconnected by a hierarchical switch. The technique further enables an ordered channel of the system to make progress in the presence of a blocked interface within the hierarchical switch. Specifically, the technique combines ordered components and unordered data components into common packets that are transmitted over an ordered channel of the system in the event that ordered and unordered components are generated simultaneously. The technique further allows, in the event that a combined packet in the ordered channel is stalled due to a data buffer dependency, the packet to be decomposed into an ordered component and an unordered data component wherein the ordered component remains in the ordered channel and the unordered data component is reassigned to the unordered data channel.
    Type: Application
    Filed: March 21, 2001
    Publication date: April 24, 2003
    Inventors: Stephen R. Van Doren, Simon C. Steely, Madhumitra Sharma
  • Publication number: 20030037223
    Abstract: A method, for executing a load locked and a store conditional instruction in a processor, achieves an atomic read-write operation to a memory block. First the load locked instruction is executed to read a memory block, and the processor in response to executing the load locked instruction issues a read modify system command to read the block and to take ownership of the block by the processor, and also sets a lock flag for the address of the memory block, and writes a value of the memory block into a cache of the processor as a cache copy of the memory block. The lock flag, upon receipt of an invalidate message by the processor for the cache copy of the memory block, is reset if any invalidate messages for the memory block are received by the processor. The processor waits for a selected time interval before the processor surrenders ownership of the memory block upon receipt of an ownership request message, if any is received by the processor after execution of the load locked instruction.
    Type: Application
    Filed: August 20, 2001
    Publication date: February 20, 2003
    Inventors: Simon C. Steely, Stephen R. Van Doren, Madhumitra Sharma
  • Publication number: 20020194290
    Abstract: A multiple-processor system in which a commit message is returned to a source processor that requests a memory access operation so as to indicate the apparent completion of the operation includes a multiple-level switch unit linking nodes that contain the processors. The switch unit includes multiple input switches each of which receives messages from multiple nodes, and a set of output switches whose inputs are the outputs of the input switches and whose outputs are the inputs of the nodes. Each switch processes messages in the order in which they are received by the switch and each output switch follows the same rule as the other output switches.
    Type: Application
    Filed: April 26, 2001
    Publication date: December 19, 2002
    Inventors: Simon C. Steely, Madhumitra Sharma, Stephen R. Van Doren
  • Patent number: 6493801
    Abstract: An adaptive cache coherent purging protocol includes recognizing system performance, especially latency, is affected by when cache is purged. The occurrence of performance enhancing and degrading events regarding a cache are counted and compared to a threshold. When the threshold is triggered the cache becomes a candidate for purging. In an embodiment, a time out delay is implemented before actual purging occurs. When the threshold is not triggered but a cache event occurs, a fake time out delay is triggered and the count is adaptively either raised, lowered or set to zero in response to performance enhancing and/or degrading events. The effect is to make the actual purging more likely if the history of cache events indicates that the performance would be enhanced thereby or less likely if the history indicates that the performance would be degraded thereby.
    Type: Grant
    Filed: January 26, 2001
    Date of Patent: December 10, 2002
    Assignee: Compaq Computer Corporation
    Inventors: Simon C. Steely, Jr., Nikolaos Hardavellas
  • Publication number: 20020152358
    Abstract: A performance enhancing change-to-dirty operation (CTD) is disclosed wherein contention among several processors trying to gain ownership of a block of data is obviated by arranging the CTD to always succeed. A method and a system are disclosed where a processor in a multiprocessor system having a copy of data gains assured ownership of data that the processor may then write. The method provides for the possibilities of conditions that may exist and provides a scenario that the requesting processor may have to wait for the ownership. Conditions are handled where the memory is the “owner” of the data and where other processor are requesting ownership, and where copies of the data exist at other processors. The method provides for messages to other processor having copies of the data informing them that the data is now invalid.
    Type: Application
    Filed: April 13, 2001
    Publication date: October 17, 2002
    Inventors: Simon C. Steely, Stephen R. Van Doren, Madhumitra Sharma
  • Publication number: 20020146022
    Abstract: A credit-based, flow control technique utilizes a plurality of counters to conserve resources of a switch fabric within a modular multiprocessor system while ensuring that transaction packets pending in virtual channel queues of the fabric efficiently progress through those resources. The multiprocessor system includes a plurality of nodes interconnected by the switch fabric that extends from a global input port of a node through a hierarchical switch to a global output port of the same or another node. The resources include shared buffers within the global ports and hierarchical switch. Each counter is associated with a virtual channel queue and the flow control technique uses the counters to essentially create the structure of the shared buffers.
    Type: Application
    Filed: April 9, 2001
    Publication date: October 10, 2002
    Inventors: Stephen R. Van Doren, Simon C. Steely, Madhumitra Sharma, Gregory E. Tierney
  • Publication number: 20020103976
    Abstract: An adaptive cache coherent purging protocol includes recognizing system performance, especially latency, is affected by when cache is purged. The occurrence of performance enhancing and degrading events regarding a cache are counted and compared to a threshold. When the threshold is triggered the cache becomes a candidate for purging. In an embodiment, a time out delay is implemented before actual purging occurs. When the threshold is not triggered but a cache event occurs, a fake time out delay is triggered and the count is adaptively either raised, lowered or set to zero in response to performance enhancing and/or degrading events. The effect is to make the actual purging more likely if the history of cache events indicates that the performance would be enhanced thereby or less likely if the history indicates that the performance would be degraded thereby.
    Type: Application
    Filed: January 26, 2001
    Publication date: August 1, 2002
    Inventors: Simon C. Steely, Nikolaos Hardavellas
  • Publication number: 20020099913
    Abstract: A system for adaptively bypassing one or more higher cache levels following a miss in a lower level of a cache hierarchy is described. Each cache level preferably includes a tag store containing address and state information for each cache line resident in the respective cache. When an invalidate request is received at a given cache hierarchy, each cache level is searched for the address specified by the invalidate request. When an address match is detected, the state of the respective cache line is changed to the invalid state, although the address of the cache line is left in the tag store. Thereafter, if the processor or entity associated with this cache hierarchy issues its own request for this same cache line, the cache hierarchy begins searching the tag store of each level starting with the lowest cache level.
    Type: Application
    Filed: January 25, 2001
    Publication date: July 25, 2002
    Inventor: Simon C. Steely
  • Publication number: 20020099927
    Abstract: A “data verified”, or DV, bit is included in an instruction to indicate if the instruction or a dependent instruction may be associated with the retrieved data as soon as the data is available or should instead be associated with the data after verification. If the DV bit is in a first state, e.g., not set, the system may issue instructions that use associated data as soon as the data is available. If the DV bit is in a second state, e.g., set, the system does not issue the instructions that use the data until the data is verified. The system or user sets the DV bit based on an analysis of an instruction set that includes the instruction and/or accumulated profile data from previous use or uses of the software. The DV bit is set in a LOAD instruction if the dependent user instruction is close enough in the instruction set that the user instruction is likely to issue before the data is verified and/or if the LOAD instruction is part of a relatively long chain of instructions.
    Type: Application
    Filed: January 22, 2001
    Publication date: July 25, 2002
    Inventor: Simon C. Steely
  • Publication number: 20020099833
    Abstract: A distributed processing system includes a cache coherency mechanism that essentially encodes network routing information into sectored presence bits. The mechanism organizes the sectored presence bits as one or more arbitration masks that system switches decode and use directly to route invalidate messages through one or more higher levels of the system. The lower level or levels of the system use local routing mechanisms, such as local directories, to direct the invalidate messages to the individual processors that are holding the data of interest.
    Type: Application
    Filed: January 24, 2001
    Publication date: July 25, 2002
    Inventors: Simon C. Steely, Stephen Van Doren, Madhumitra Sharma
  • Publication number: 20020009095
    Abstract: A technique decomposes a multicast transaction issued by one of a plurality of nodes of a distributed shared memory multiprocessor system into a series of multicast packets, each of which may further “spawn” multicast messages directed to a subset of the nodes. A central switch fabric interconnects the nodes, each of which includes a global port coupled to the switch, a plurality of processors and memory. The central switch includes a central ordering point that maintains an order of packets issued by, e.g., a source processor of a remote node when requesting data resident in a memory of a home node. The multicast messages spawned from a multicast packet passing the central ordering point are generated according to multicast decomposition and ordering rules of the inventive technique.
    Type: Application
    Filed: May 31, 2001
    Publication date: January 24, 2002
    Inventors: Stephen R. Van Doren, Simon C. Steely, Madhumitra Sharma
  • Publication number: 20010055277
    Abstract: An initiate flow control mechanism prevents interconnect resources within a switch fabric of a modular multiprocessor system from being dominated with initiate transactions. The multiprocessor system comprises a plurality of nodes interconnected by a switch fabric that extends from a global input port of a node through a hierarchical switch to a global output port of the same or another node. The interconnect resources include shared buffers within the global ports and hierarchical switch. The initiate flow control mechanism manages these shared buffers to reserve bandwidth for complete transactions when extensive global initiate traffic to one or more nodes of the system may create a bottleneck in the switch fabric.
    Type: Application
    Filed: May 11, 2001
    Publication date: December 27, 2001
    Inventors: Simon C. Steely, Madhumitra Sharma, Stephen R. Van Doren, Gregory E. Tierney
  • Publication number: 20010049742
    Abstract: A flow control technique prevents overflow of a write storage structure, such as a first-in, first-out (FIFO) queue, in a centralized Duplicate Tag store arrangement of a multiprocessor system that includes a plurality of nodes interconnected by a central switch. Each node comprises a plurality of processors with associated caches and memories interconnected by a local switch. Each node further comprises a Duplicate Tag (DTAG) store that contains information about the state of data relative to all processors of a node. The DTAG comprises the write FIFO which has a limited number of entries. Flow control logic in the local switch keeps track of when those entries may be occupied to avoid overflowing the FIFO.
    Type: Application
    Filed: May 29, 2001
    Publication date: December 6, 2001
    Inventors: Simon C. Steely, Hari Krishan Nagpal, Stephen R. Van Doren
  • Patent number: 6295585
    Abstract: A multi-node computer network includes a plurality of nodes coupled together via a data link. Each of the nodes includes a local memory, which further comprises a shared memory. Certain items of data that are to be shared by the nodes are stored in the shared portion of memory. Associated with each of the shared data items is a data structure. When a node sharing data with other nodes in the system seeks to modify the data, it transmits the modifications over the data link to the other nodes in the network. Each update is received in order by each node in the cluster. As part of the last transmission by the modifying node, an acknowledgement request is sent to the receiving nodes in the cluster. Each node that receives the acknowledgment request returns an acknowledgement to the sending node. The returned acknowledgement is written to the data structure associated with the shared data item.
    Type: Grant
    Filed: June 7, 1995
    Date of Patent: September 25, 2001
    Assignee: Compaq Computer Corporation
    Inventors: Richard B. Gillett, Jr., Glenn P. Garvey, Simon C. Steely, Jr.