Patents by Inventor Madhumitra Sharma
Madhumitra Sharma has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 6961825Abstract: A distributed processing system includes a cache coherency mechanism that essentially encodes network routing information into sectored presence bits. The mechanism organizes the sectored presence bits as one or more arbitration masks that system switches decode and use directly to route invalidate messages through one or more higher levels of the system. The lower level or levels of the system use local routing mechanisms, such as local directories, to direct the invalidate messages to the individual processors that are holding the data of interest.Type: GrantFiled: January 24, 2001Date of Patent: November 1, 2005Assignee: Hewlett-Packard Development Company, L.P.Inventors: Simon C. Steely, Jr., Stephen Van Doren, Madhumitra Sharma
-
Patent number: 6904465Abstract: A multiple-processor system in which a commit message is returned to a source processor that requests a memory access operation so as to indicate the apparent completion of the operation includes a multiple-level switch unit linking nodes that contain the processors. The switch unit includes multiple input switches each of which receives messages from multiple nodes, and a set of output switches whose inputs are the outputs of the input switches and whose outputs are the inputs of the nodes. Each switch processes messages in the order in which they are received by the switch and each output switch follows the same rule as the other output switches.Type: GrantFiled: April 26, 2001Date of Patent: June 7, 2005Assignee: Hewlett-Packard Development Company, L.P.Inventors: Simon C. Steely, Jr., Madhumitra Sharma, Stephen R. Van Doren
-
Patent number: 6801986Abstract: A method, for executing a load locked and a store conditional instruction in a processor, achieves an atomic read-write operation to a memory block. First the load locked instruction is executed to read a memory block, and the processor in response to executing the load locked instruction issues a read modify system command to read the block and to take ownership of the block by the processor, and also sets a lock flag for the address of the memory block, and writes a value of the memory block into a cache of the processor as a cache copy of the memory block. The lock flag, upon receipt of an invalidate message by the processor for the cache copy of the memory block, is reset if any invalidate messages for the memory block are received by the processor. The processor waits for a selected time interval before the processor surrenders ownership of the memory block upon receipt of an ownership request message, if any is received by the processor after execution of the load locked instruction.Type: GrantFiled: August 20, 2001Date of Patent: October 5, 2004Assignee: Hewlett-Packard Development Company, L.P.Inventors: Simon C. Steely, Jr., Stephen R. Van Doren, Madhumitra Sharma
-
Publication number: 20030076831Abstract: A technique efficiently combines data and ordered transactions in a multiprocessor system having a plurality of nodes interconnected by a hierarchical switch. The technique further enables an ordered channel of the system to make progress in the presence of a blocked interface within the hierarchical switch. Specifically, the technique combines ordered components and unordered data components into common packets that are transmitted over an ordered channel of the system in the event that ordered and unordered components are generated simultaneously. The technique further allows, in the event that a combined packet in the ordered channel is stalled due to a data buffer dependency, the packet to be decomposed into an ordered component and an unordered data component wherein the ordered component remains in the ordered channel and the unordered data component is reassigned to the unordered data channel.Type: ApplicationFiled: March 21, 2001Publication date: April 24, 2003Inventors: Stephen R. Van Doren, Simon C. Steely, Madhumitra Sharma
-
Publication number: 20030037223Abstract: A method, for executing a load locked and a store conditional instruction in a processor, achieves an atomic read-write operation to a memory block. First the load locked instruction is executed to read a memory block, and the processor in response to executing the load locked instruction issues a read modify system command to read the block and to take ownership of the block by the processor, and also sets a lock flag for the address of the memory block, and writes a value of the memory block into a cache of the processor as a cache copy of the memory block. The lock flag, upon receipt of an invalidate message by the processor for the cache copy of the memory block, is reset if any invalidate messages for the memory block are received by the processor. The processor waits for a selected time interval before the processor surrenders ownership of the memory block upon receipt of an ownership request message, if any is received by the processor after execution of the load locked instruction.Type: ApplicationFiled: August 20, 2001Publication date: February 20, 2003Inventors: Simon C. Steely, Stephen R. Van Doren, Madhumitra Sharma
-
Publication number: 20020194290Abstract: A multiple-processor system in which a commit message is returned to a source processor that requests a memory access operation so as to indicate the apparent completion of the operation includes a multiple-level switch unit linking nodes that contain the processors. The switch unit includes multiple input switches each of which receives messages from multiple nodes, and a set of output switches whose inputs are the outputs of the input switches and whose outputs are the inputs of the nodes. Each switch processes messages in the order in which they are received by the switch and each output switch follows the same rule as the other output switches.Type: ApplicationFiled: April 26, 2001Publication date: December 19, 2002Inventors: Simon C. Steely, Madhumitra Sharma, Stephen R. Van Doren
-
Publication number: 20020152358Abstract: A performance enhancing change-to-dirty operation (CTD) is disclosed wherein contention among several processors trying to gain ownership of a block of data is obviated by arranging the CTD to always succeed. A method and a system are disclosed where a processor in a multiprocessor system having a copy of data gains assured ownership of data that the processor may then write. The method provides for the possibilities of conditions that may exist and provides a scenario that the requesting processor may have to wait for the ownership. Conditions are handled where the memory is the “owner” of the data and where other processor are requesting ownership, and where copies of the data exist at other processors. The method provides for messages to other processor having copies of the data informing them that the data is now invalid.Type: ApplicationFiled: April 13, 2001Publication date: October 17, 2002Inventors: Simon C. Steely, Stephen R. Van Doren, Madhumitra Sharma
-
Publication number: 20020146022Abstract: A credit-based, flow control technique utilizes a plurality of counters to conserve resources of a switch fabric within a modular multiprocessor system while ensuring that transaction packets pending in virtual channel queues of the fabric efficiently progress through those resources. The multiprocessor system includes a plurality of nodes interconnected by the switch fabric that extends from a global input port of a node through a hierarchical switch to a global output port of the same or another node. The resources include shared buffers within the global ports and hierarchical switch. Each counter is associated with a virtual channel queue and the flow control technique uses the counters to essentially create the structure of the shared buffers.Type: ApplicationFiled: April 9, 2001Publication date: October 10, 2002Inventors: Stephen R. Van Doren, Simon C. Steely, Madhumitra Sharma, Gregory E. Tierney
-
Publication number: 20020099833Abstract: A distributed processing system includes a cache coherency mechanism that essentially encodes network routing information into sectored presence bits. The mechanism organizes the sectored presence bits as one or more arbitration masks that system switches decode and use directly to route invalidate messages through one or more higher levels of the system. The lower level or levels of the system use local routing mechanisms, such as local directories, to direct the invalidate messages to the individual processors that are holding the data of interest.Type: ApplicationFiled: January 24, 2001Publication date: July 25, 2002Inventors: Simon C. Steely, Stephen Van Doren, Madhumitra Sharma
-
Publication number: 20020009095Abstract: A technique decomposes a multicast transaction issued by one of a plurality of nodes of a distributed shared memory multiprocessor system into a series of multicast packets, each of which may further “spawn” multicast messages directed to a subset of the nodes. A central switch fabric interconnects the nodes, each of which includes a global port coupled to the switch, a plurality of processors and memory. The central switch includes a central ordering point that maintains an order of packets issued by, e.g., a source processor of a remote node when requesting data resident in a memory of a home node. The multicast messages spawned from a multicast packet passing the central ordering point are generated according to multicast decomposition and ordering rules of the inventive technique.Type: ApplicationFiled: May 31, 2001Publication date: January 24, 2002Inventors: Stephen R. Van Doren, Simon C. Steely, Madhumitra Sharma
-
Publication number: 20010055277Abstract: An initiate flow control mechanism prevents interconnect resources within a switch fabric of a modular multiprocessor system from being dominated with initiate transactions. The multiprocessor system comprises a plurality of nodes interconnected by a switch fabric that extends from a global input port of a node through a hierarchical switch to a global output port of the same or another node. The interconnect resources include shared buffers within the global ports and hierarchical switch. The initiate flow control mechanism manages these shared buffers to reserve bandwidth for complete transactions when extensive global initiate traffic to one or more nodes of the system may create a bottleneck in the switch fabric.Type: ApplicationFiled: May 11, 2001Publication date: December 27, 2001Inventors: Simon C. Steely, Madhumitra Sharma, Stephen R. Van Doren, Gregory E. Tierney
-
Patent number: 6286090Abstract: A technique selectively imposes inter-reference ordering between memory reference operations issued by a processor of a multiprocessor system to addresses within a page pertaining to a page table entry (PTE) that is affected by a translation buffer (TB) miss flow routine. The TB miss flow is used to retrieve information contained in the PTE for mapping a virtual address to a physical address and, subsequently, to allow retrieval of data at the mapped physical address. The PTE that is retrieved in response to a memory reference (read) operation is not loaded into the TB until a commit-signal associated with that read operation is returned to the processor. Once the PTE and associated commit-signal are returned, the processor loads the PTE into the TB so that it can be used for a subsequent read operation directed to the data at the physical address.Type: GrantFiled: May 26, 1998Date of Patent: September 4, 2001Assignee: Compaq Computer CorporationInventors: Simon C. Steely, Jr., Madhumitra Sharma, Stephen R. Van Doren, Kourosh Gharachorloo
-
Patent number: 6279084Abstract: The invention pertains to serializing local and remote references to a portion of a shared memory to optimize sequencing of requests in a switch-based, multi-processor system in which the local and remote references can occur concurrently. Usually, local accesses are typically much faster than remote accesses. Thus, in the interest of performance, both local and remote accesses are permitted to occur concurrently in the multiprocessing system. However, in one instance a local access can cause deadlock problems for a remote access. In addition, problems associated with coherency of the shared memory can also arise. Thus, in order to prevent deadlock problems and to maintain coherency of a shared memory, if a local reference to an address of memory has been forwarded to a switch, in this instance a hierarchical switch, then all subsequent references to that address of memory are forwarded to the hierarchical switch. The hierarchical switch has ordering properties that maintain the received order of inputs.Type: GrantFiled: October 24, 1997Date of Patent: August 21, 2001Assignee: Compaq Computer CorporationInventors: Stephen R. VanDoren, Simon C. Steely, Madhumitra Sharma, Hari Krishnan Nagpal
-
Patent number: 6249520Abstract: An architecture and coherency protocol for use in a large SMP computer system includes a hierarchical switch structure which allows for a number of multi-processor nodes to be coupled to the switch to operate at an optimum performance. Within each multi-processor node, a simultaneous buffering system is provided that allows all of the processors of the multi-processor node to operate at peak performance. A memory is shared among the nodes, with a portion of the memory resident at each of the multi-processor nodes. Each of the multi-processor nodes includes a number of elements for maintaining memory coherency, including a victim cache, a directory and a transaction tracking table. The victim cache allows for selective updates of victim data destined for memory stored at a remote multi-processing node, thereby improving the overall performance of memory.Type: GrantFiled: October 24, 1997Date of Patent: June 19, 2001Assignee: Compaq Computer CorporationInventors: Simon C. Steely, Jr., Stephen R. VanDoren, Madhumitra Sharma, Craig D. Keefer, David W. Davis
-
Patent number: 6209065Abstract: A mechanism optimizes the generation of a commit-signal by control logic of the multiprocessor system in response to a memory reference operation issued by a processor to a local node of a multiprocessor system having a hierarchical switch for interconnecting a plurality of nodes. The mechanism generally comprises a structure that indicates whether the memory reference operation affects other processors of other nodes of the multiprocessor system. An ordering point of the local node generates an optimized commit-signal when the structure indicates that the memory reference operation does not affect the other processors.Type: GrantFiled: October 24, 1997Date of Patent: March 27, 2001Assignee: Compaq Computer CorporationInventors: Stephen R. Van Doren, Simon C. Steely, Jr., Kourosh Gharachorloo, Madhumitra Sharma
-
Patent number: 6202126Abstract: A method for preventing inadvertent invalidation of data elements in a system having a separate probe queue and fill queue for each central processing unit, is provided wherein a central processing unit stores a clean data element, that would otherwise have been discarded, in a victim data buffer when it is evicted from cache. The central processing unit subsequently issues a clean-victim command to the system control logic when the readmiss or read-miss-modify command, targeting the data element that maps to the same location in cache as the clean data element, is issued. The clean-victim command causes the duplicate tag store to indicate that the clean data element is no longer stored in that central processing unit's cache. While the data is stored therein, the central processing unit cannot issue a probe message that targets that data until the victim data buffer has been deallocated.Type: GrantFiled: October 24, 1997Date of Patent: March 13, 2001Assignee: Compaq Computer CorporationInventors: Stephen Van Doren, Simon C. Steely, Jr., Madhumitra Sharma
-
Patent number: 6154816Abstract: An architecture and coherency protocol for use in a large SMP computer system includes a hierarchical switch structure which allows for a number of multi-processor nodes to be coupled to the switch to operate at an optimum performance. Within each multi-processor node, a simultaneous buffering system is provided that allows all of the processors of the multi-processor node to operate at peak performance. A memory is shared among the nodes, with a portion of the memory resident at each of the multi-processor nodes. Each of the multi-processor nodes includes a number of elements for maintaining memory coherency, including a victim cache, a directory and a transaction tracking table. The victim cache allows for selective updates of victim data destined for memory stored at a remote multi-processing node, thereby improving the overall performance of memory.Type: GrantFiled: October 24, 1997Date of Patent: November 28, 2000Assignee: Compaq Computer Corp.Inventors: Simon C. Steely, Madhumitra Sharma, Stephen R. VanDoren
-
Patent number: 6122714Abstract: An architecture and coherency protocol for use in a large SMP computer system includes a hierarchical switch structure which allows for a number of multi-processor nodes to be coupled to the switch to operate at an optimum performance. Within each multi-processor node, a simultaneous buffering system is provided that allows all of the processors of the multi-processor node to operate at peak performance. A memory is shared among the nodes, with a portion of the memory resident at each of the multi-processor nodes. Each of the multi-processor nodes includes a number of elements for maintaining memory coherency, including a victim cache, a directory and a transaction tracking table. The victim cache allows for selective updates of victim data destined for memory stored at a remote multi-processing node, thereby improving the overall performance of memory.Type: GrantFiled: October 24, 1997Date of Patent: September 19, 2000Assignee: Compaq Computer Corp.Inventors: Stephen R. VanDoren, Simon C. Steely, Madhumitra Sharma, David M. Fenwick
-
Patent number: 6108737Abstract: A mechanism reduces the latency of inter-reference ordering between sets of memory reference operations in a multiprocessor system having a shared memory. The mechanism comprises a commit-signal that is generated by control logic of the multiprocessor system in response to an issued memory reference operation. The commit-signal facilitates inter-reference ordering; moreover, the commit signal indicates the apparent completion of the memory reference operation, rather than actual completion of the operation. The apparent completion of an operation occurs substantially sooner than the actual completion of an operation, thereby improving performance of the multiprocessor system.Type: GrantFiled: October 24, 1997Date of Patent: August 22, 2000Assignee: Compaq Computer CorporationInventors: Madhumitra Sharma, Stephen R. Van Doren, Kourosh Gharachorloo, Simon C. Steely, Jr.
-
Patent number: 6101420Abstract: An architecture and coherency protocol for use in a large SMP computer system includes a hierarchical switch structure which allows for a number of multi-processor nodes to be coupled to the switch to operate at an optimum performance. Within each multi-processor node, a simultaneous buffering system is provided that allows all of the processors of the multi-processor node to operate at peak performance. A memory is shared among the nodes, with a portion of the memory resident at each of the multi-processor nodes. Each of the multi-processor nodes includes a number of elements for maintaining memory coherency, including a victim cache, a directory and a transaction tracking table. The victim cache allows for selective updates of victim data destined for memory stored at a remote multi-processing node, thereby improving the overall performance of memory.Type: GrantFiled: October 24, 1997Date of Patent: August 8, 2000Assignee: Compaq Computer CorporationInventors: Stephen R. VanDoren, Simon C. Steely, Madhumitra Sharma, Kourosh Gharachorloo