Patents by Inventor Kourosh Gharachorloo

Kourosh Gharachorloo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20020087806
    Abstract: The present invention relates generally to a protocol engine for use in a multiprocessor computer system. The protocol engine, which implements a cache coherence protocol, includes a clock signal generator for generating signals denoting interleaved even clock periods and odd clock periods, a memory transaction state array for storing entries, each denoting the state of a respective memory transaction, and processing logic. The memory transactions are divided into even and odd transactions whose states are stored in distinct sets of entries in the memory transaction state array. The processing logic has interleaving circuitry for processing during even clock periods the even memory transactions and for processing during odd clock periods the odd memory transactions. Moreover, the protocol engine is configured to transition from one memory transaction to another in a minimum number of clock cycles.
    Type: Application
    Filed: January 7, 2002
    Publication date: July 4, 2002
    Inventors: Kourosh Gharachorloo, Luiz A. Barroso, Mosur K. Ravishankar, Robert J. Stets
  • Publication number: 20020083274
    Abstract: An invalid-to-dirty request permits a transition from an invalid memory state to a dirty state without requiring an up-to-date copy of the memory. The present invention is a system for supporting invalid-to-dirty memory transactions in an aggressive cache coherence protocol that minimizes directory entry locking. The nodes of a multiprocessor system each include a protocol engine that is configured to implement a distinct invalidation request that corresponds to an invalid-to-dirty memory transaction. If node O receives this distinct invalidation request while waiting for a response to an outstanding request for exclusive ownership, the protocol engine of node O is configured to treat the distinct invalidation request as applying to the memory line of information that is the subject of the outstanding request for exclusive ownership.
    Type: Application
    Filed: January 7, 2002
    Publication date: June 27, 2002
    Inventors: Kourosh Gharachorloo, Luiz Andre Barroso, Mosur Kumaraswamy Ravishankar, Robert J. Stets, Daniel J. Scales
  • Patent number: 6412056
    Abstract: A software distributed shared memory system includes a translation lookaside buffer extended to include fine-grain memory block-state bits associated with each block of information within a page stored in memory. The block-state bits provide multiple block states for each block. The block-state bits are used to check the state of each block, thereby alleviating the need for software checks and reducing checking overheads associated therewith.
    Type: Grant
    Filed: October 1, 1997
    Date of Patent: June 25, 2002
    Assignee: Compac Information Technologies Group, LP
    Inventors: Kourosh Gharachorloo, Daniel J. Scales
  • Publication number: 20020046327
    Abstract: The present invention relates generally to a protocol engine for use in a multiprocessor computer system. The protocol engine, which implements a cache coherence protocol, includes a clock signal generator for generating signals denoting interleaved even clock periods and odd clock periods, a memory transaction state array for storing entries, each denoting the state of a respective memory transaction, and processing logic. The memory transactions are divided into even and odd transactions whose states are stored in distinct sets of entries in the memory transaction state array. The processing logic has interleaving circuitry for processing during even clock periods the even memory transactions and for processing during odd clock periods the odd memory transactions.
    Type: Application
    Filed: June 11, 2001
    Publication date: April 18, 2002
    Inventors: Kourosh Gharachorloo, Luiz A. Barroso, Mosur K. Ravishankar, Robert J. Stets,, Andreas Nowatzyk
  • Publication number: 20020046324
    Abstract: A chip-multiprocessing system with scalable architecture, including on a single chip: a plurality of processor cores; a two-level cache hierarchy; an intra-chip switch; one or more memory controllers; a cache coherence protocol; one or more coherence protocol engines; and an interconnect subsystem. The two-level cache hierarchy includes first level and second level caches. In particular, the first level caches include a pair of instruction and data caches for, and private to, each processor core. The second level cache has a relaxed inclusion property, the second-level cache being logically shared by the plurality of processor cores. Each of the plurality of processor cores is capable of executing an instruction set of the ALPHA™ processing core. The scalable architecture of the chip-multiprocessing system is targeted at parallel commercial workloads.
    Type: Application
    Filed: June 8, 2001
    Publication date: April 18, 2002
    Inventors: Luiz Andre Barroso, Kourosh Gharachorloo, Andreas Nowatzyk
  • Publication number: 20020010836
    Abstract: To maximize the effective use of on-chip cache, a method and system for exclusive two-level caching in a chip-multiprocessor are provided. The exclusive two-level caching in accordance with the present invention involves method relaxing the inclusion requirement in a two-level cache system in order to form an exclusive cache hierarchy. Additionally, the exclusive two-level caching involves providing a first-level tag-state structure in a first-level cache of the two-level cache system. The first tag-state structure has state information. The exclusive two-level caching also involves maintaining in a second-level cache of the two-level cache system a duplicate of the first-level tag-state structure and extending the state information in the duplicate of the first tag-state structure, but not in the first-level tag-state structure itself, to include an owner indication.
    Type: Application
    Filed: June 8, 2001
    Publication date: January 24, 2002
    Inventors: Luiz Andre Barroso, Kourosh Gharachorloo, Andreas Nowatzyk
  • Publication number: 20020010840
    Abstract: A computer system has a plurality of processor nodes and a plurality of input/output nodes. Each processor node includes a multiplicity of processor cores, an interface to a local memory system and a protocol engine implementing a predefined cache coherence protocol. Each processor core has an associated memory cache for caching memory lines of information. Each input/output node includes no processor cores, an input/output interface for interfacing to an input/output bus or input/output device, a memory cache for caching memory lines of information and an interface to a local memory subsystem. The local memory subsystem of each processor node and input/output node stores a multiplicity of memory lines of information. The protocol engine of each processor node and input/output node implements the same predefined cache coherence protocol.
    Type: Application
    Filed: June 11, 2001
    Publication date: January 24, 2002
    Inventors: Luiz A. Barroso, Kourosh Gharachorloo, Andreas Nowatzyk, Mosur K. Ravishankar, Robert J. Stets
  • Publication number: 20020007439
    Abstract: A protocol engine is for use in each node of a computer system having a plurality of nodes. Each node includes an interface to a local memory subsystem that stores memory lines of information, a directory, and a memory cache. The directory includes an entry associated with a memory line of information stored in the local memory subsystem. The directory entry includes an identification field for identifying sharer nodes that potentially cache the memory line of information. The identification field has a plurality of bits at associated positions within the identification field. Each respective bit of the identification field is associated with one or more nodes. The protocol engine furthermore sets each bit in the identification field for which the memory line is cached in at least one of the associated nodes.
    Type: Application
    Filed: June 11, 2001
    Publication date: January 17, 2002
    Inventors: Kourosh Gharachorloo, Luiz A. Barroso, Robert J. Stets, Mosur K. Ravishankar, Andreas Nowatzyk
  • Publication number: 20020007443
    Abstract: The present invention relates generally to multiprocessor computer system, and particularly to a multiprocessor system designed to be highly scalable, using efficient cache coherence logic and methodologies. More specifically, the present invention is a system and method including a plurality of processor nodes configured to execute a cache coherence protocol that avoids the use of negative acknowledgment messages (NAKs) and ordering requirements on the underlying transaction-message interconnect/network and services most 3-hop transactions with only a single visit to the home node.
    Type: Application
    Filed: June 11, 2001
    Publication date: January 17, 2002
    Inventors: Kourosh Gharachorloo, Luiz A. Barroso, Mosur K. Ravishankar, Robert J. Stets, Daniel J. Scales
  • Patent number: 6286090
    Abstract: A technique selectively imposes inter-reference ordering between memory reference operations issued by a processor of a multiprocessor system to addresses within a page pertaining to a page table entry (PTE) that is affected by a translation buffer (TB) miss flow routine. The TB miss flow is used to retrieve information contained in the PTE for mapping a virtual address to a physical address and, subsequently, to allow retrieval of data at the mapped physical address. The PTE that is retrieved in response to a memory reference (read) operation is not loaded into the TB until a commit-signal associated with that read operation is returned to the processor. Once the PTE and associated commit-signal are returned, the processor loads the PTE into the TB so that it can be used for a subsequent read operation directed to the data at the physical address.
    Type: Grant
    Filed: May 26, 1998
    Date of Patent: September 4, 2001
    Assignee: Compaq Computer Corporation
    Inventors: Simon C. Steely, Jr., Madhumitra Sharma, Stephen R. Van Doren, Kourosh Gharachorloo
  • Patent number: 6209065
    Abstract: A mechanism optimizes the generation of a commit-signal by control logic of the multiprocessor system in response to a memory reference operation issued by a processor to a local node of a multiprocessor system having a hierarchical switch for interconnecting a plurality of nodes. The mechanism generally comprises a structure that indicates whether the memory reference operation affects other processors of other nodes of the multiprocessor system. An ordering point of the local node generates an optimized commit-signal when the structure indicates that the memory reference operation does not affect the other processors.
    Type: Grant
    Filed: October 24, 1997
    Date of Patent: March 27, 2001
    Assignee: Compaq Computer Corporation
    Inventors: Stephen R. Van Doren, Simon C. Steely, Jr., Kourosh Gharachorloo, Madhumitra Sharma
  • Patent number: 6108737
    Abstract: A mechanism reduces the latency of inter-reference ordering between sets of memory reference operations in a multiprocessor system having a shared memory. The mechanism comprises a commit-signal that is generated by control logic of the multiprocessor system in response to an issued memory reference operation. The commit-signal facilitates inter-reference ordering; moreover, the commit signal indicates the apparent completion of the memory reference operation, rather than actual completion of the operation. The apparent completion of an operation occurs substantially sooner than the actual completion of an operation, thereby improving performance of the multiprocessor system.
    Type: Grant
    Filed: October 24, 1997
    Date of Patent: August 22, 2000
    Assignee: Compaq Computer Corporation
    Inventors: Madhumitra Sharma, Stephen R. Van Doren, Kourosh Gharachorloo, Simon C. Steely, Jr.
  • Patent number: 6101420
    Abstract: An architecture and coherency protocol for use in a large SMP computer system includes a hierarchical switch structure which allows for a number of multi-processor nodes to be coupled to the switch to operate at an optimum performance. Within each multi-processor node, a simultaneous buffering system is provided that allows all of the processors of the multi-processor node to operate at peak performance. A memory is shared among the nodes, with a portion of the memory resident at each of the multi-processor nodes. Each of the multi-processor nodes includes a number of elements for maintaining memory coherency, including a victim cache, a directory and a transaction tracking table. The victim cache allows for selective updates of victim data destined for memory stored at a remote multi-processing node, thereby improving the overall performance of memory.
    Type: Grant
    Filed: October 24, 1997
    Date of Patent: August 8, 2000
    Assignee: Compaq Computer Corporation
    Inventors: Stephen R. VanDoren, Simon C. Steely, Madhumitra Sharma, Kourosh Gharachorloo
  • Patent number: 6088771
    Abstract: A technique reduces the latency of a memory barrier (MB) operation used to impose an inter-reference order between sets of memory reference operations issued by a processor to a multiprocessor system having a shared memory. The technique comprises issuing the MB operation immediately after issuing a first set of memory reference operations (i.e., the pre-MB operations) without waiting for responses to those pre-MB operations. Issuance of the MB operation to the system results in serialization of that operation and generation of a MB Acknowledgment (MB-Ack) command. The MB-Ack is loaded into a probe queue of the issuing processor and, according to the invention, functions to pull-in all previously ordered invalidate and probe commands in that queue. By ensuring that the probes and invalidates are ordered before the MB-Ack is received at the issuing processor, the inventive technique provides the appearance that all pre-MB references have completed.
    Type: Grant
    Filed: October 24, 1997
    Date of Patent: July 11, 2000
    Assignee: Digital Equipment Corporation
    Inventors: Simon C. Steely, Jr., Madhumitra Sharma, Kourosh Gharachorloo, Stephen R. Van Doren
  • Patent number: 6085263
    Abstract: An improved I/O processor (IOP) delivers high I/O performance while maintaining inter-reference ordering among memory reference operations issued by an I/O device as specified by a consistency model in a shared memory multiprocessor system. The IOP comprises a retire controller which imposes inter-reference ordering among the operations based on receipt of a commit signal for each operation, wherein the commit signal for a memory reference operation indicates the apparent completion of the operation rather than actual completion of the operation. In addition, the IOP comprises a prefetch controller coupled to an I/O cache for prefetching data into cache without any ordering constraints (or out-of-order). The ordered retirement functions of the IOP are separated from its prefetching operations, which enables the latter operations to be performed in an arbitrary manner so as to improve the overall performance of the system.
    Type: Grant
    Filed: October 24, 1997
    Date of Patent: July 4, 2000
    Assignee: Compaq Computer Corp.
    Inventors: Madhumitra Sharma, Chester Pawlowski, Kourosh Gharachorloo, Stephen R. Van Doren, Simon C. Steely, Jr.
  • Patent number: 6055605
    Abstract: A technique reduces the latency of inter-reference ordering between sets of memory reference operations in a multiprocessor system having a shared memory that is distributed among a plurality of processors that share a cache. According to the technique, each processor sharing a cache inherits a commit-signal that is generated by control logic of the multiprocessor system in response to a memory reference operation issued by another processor sharing that cache. The commit-signal facilitates serialization among the processors and shared memory entities of the multiprocessor system by indicating the apparent completion of the memory reference operation to those entities of the system.
    Type: Grant
    Filed: October 24, 1997
    Date of Patent: April 25, 2000
    Assignee: Compaq Computer Corporation
    Inventors: Madhumitra Sharma, Simon C. Steely, Jr., Kourosh Gharachorloo, Stephen R. Van Doren
  • Patent number: 5950228
    Abstract: In a distributed shared memory system, clusters of symmetric multi-processors are connected to each other by a network. Each symmetric multi-processor includes a plurality of processors, a memory having addresses, and an input/output interface to interconnect the processors. A software implemented method enables data sharing between the clusters of symmetric multi-processors using variable sized quantities of data called blocks. A set of the addresses of the memories are designated as virtual shared addresses to store shared data, and a portion of the virtual shared addresses are allocated to store a shared data structure as one or more blocks. The size of a particular allocated block can vary for different shared data structures. Each block includes an integer number of lines, and each line includes a predetermined number of bytes of shared data. Directory information of a particular block is stored in the memory of a processor designed as the home of the block.
    Type: Grant
    Filed: February 3, 1997
    Date of Patent: September 7, 1999
    Assignee: Digital Equipment Corporation
    Inventors: Daniel J. Scales, Kourosh Gharachorloo, Anshu Aggarwal
  • Patent number: 5933598
    Abstract: In a distributed shared memory system, workstations are connected to each other by a network. Each workstation includes a processor, a memory having addresses, and an input/output interface to interconnect the workstations. A software implemented method enables data sharing between the workstations using variable sized quantities of data. A set of the addresses of the memories are designated as virtual shared addresses to store shared data. A portion of the virtual shared addresses are allocated to store a shared data structure as one or more blocks. The shared data structure is accessible by programs executing in any of the processors. The size of a particular allocated block can vary for different shared data structures. Each block includes an integer number of lines, and each line includes a predetermined number of bytes of shared data. Access information of a particular block is stored in the memory of a home one of the workstations.
    Type: Grant
    Filed: July 17, 1996
    Date of Patent: August 3, 1999
    Assignee: Digital Equipment Corporation
    Inventors: Daniel J. Scales, Kourosh Gharachorloo
  • Patent number: 5787480
    Abstract: A software implemented method for lock-up free data sharing operates in a networked computer system including a plurality of workstations. Each workstation including a processor, a memory having addresses, and an input/output interface connected to each other by a bus. A set of addresses of the memories are designated as virtual shared addresses to store shared data. A portion of the virtual shared addresses is allocated to store the shared data as a plurality of blocks accessible by programs executing in any of the processors, each block including an integer number of lines. A program is instrumented to request an exclusive copy of the block if the program includes a store instruction which attempts to access data stored in a non-exclusive copy of the block. Additional instructions of the program are executed while the request for the exclusive copy of the block is pending. Addresses of data of the block modified by the additional instructions are recorded.
    Type: Grant
    Filed: July 17, 1996
    Date of Patent: July 28, 1998
    Assignee: Digital Equipment Corporation
    Inventors: Daniel J. Scales, Kourosh Gharachorloo