Patents by Inventor William E. Speight
William E. Speight has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20110292594Abstract: A scalable space-optimized and energy-efficient computing system is provided. The computing system comprises a plurality of modular compartments in at least one level of a frame configured in a hexadron configuration. The computing system also comprises an air inlet, an air mixing plenum, and at least one fan. In the computing system the plurality of modular compartments are affixed above the air inlet, the air mixing plenum is affixed above the plurality of modular compartments, and the at least one fan is affixed above the air mixing plenum. When at least one module is inserted into one of the plurality of modular compartments, the module couples to a backplane within the frame.Type: ApplicationFiled: May 28, 2010Publication date: December 1, 2011Applicant: International Business Machines CorporationInventors: John B. Carter, Wael R. El-Essawy, Elmootazbellah N. Elnozahy, Madhusudan K. Iyengar, Thomas W. Keller, JR., Jian Li, Karthick Rajamani, Juan C. Rubio, William E. Speight, Lixin Zhang
-
Publication number: 20110296112Abstract: Mechanisms for accessing a set associative cache of a data processing system are provided. A set of cache lines, in the set associative cache, associated with an address of a request are identified. Based on a determined mode of operation for the set, the following may be performed: determining if a cache hit occurs in a preferred cache line without accessing other cache lines in the set of cache lines; retrieving data from the preferred cache line without accessing the other cache lines in the set of cache lines, if it is determined that there is a cache hit in the preferred cache line; and accessing each of the other cache lines in the set of cache lines to determine if there is a cache hit in any of these other cache lines only in response to there being a cache miss in the preferred cache line(s).Type: ApplicationFiled: May 25, 2010Publication date: December 1, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jian Li, William E. Speight, Lixin Zhang
-
Publication number: 20110292597Abstract: A modular processing module is provided. The modular processing module comprises a set of processing module sides. Each processing module side comprises a circuit board, a plurality of connectors coupled to the circuit board, and a plurality of processing nodes coupled to the circuit board. Each processing module side in the set of processing module sides couples to another processing module side using at least one connector in the plurality of connectors such that, when all of the set of processing module sides are coupled together, the modular processing module is formed. The modular processing module comprises an exterior connection to a power source and a communication system.Type: ApplicationFiled: May 28, 2010Publication date: December 1, 2011Applicant: International Business Machines CorporationInventors: John B. Carter, Wael R. El-Essawy, Elmootazbellah N. Elnozahy, Wesley M. Felter, Madhusudan K. Iyengar, Thomas W. Keller, JR., Karthick Rajamani, Juan C. Rubio, William E. Speight, Lixin Zhang
-
Publication number: 20110296097Abstract: Mechanisms are provided for inhibiting precharging of memory cells of a dynamic random access memory (DRAM) structure. The mechanisms receive a command for accessing memory cells of the DRAM structure. The mechanisms further determine, based on the command, if precharging the memory cells following accessing the memory cells is to be inhibited. Moreover, the mechanisms send, in response to the determination indicating that precharging the memory cells is to be inhibited, a command to blocking logic of the DRAM structure to block precharging of the memory cells following accessing the memory cells.Type: ApplicationFiled: May 27, 2010Publication date: December 1, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Elmootazbellah N. Elnozahy, Karthick Rajamani, William E. Speight, Lixin Zhang
-
Publication number: 20110296107Abstract: A mechanism is provided within a 3D stacked memory organization to spread or stripe cache lines across multiple layers. In an example organization, a 128B cache line takes eight cycles on a 16B-wide bus. Each layer may provide 32B. The first layer uses the first two of the eight transfer cycles to send the first 32B. The next layer sends the next 32B using the next two cycles of the eight transfer cycles, and so forth. The mechanism provides a uniform memory access.Type: ApplicationFiled: May 26, 2010Publication date: December 1, 2011Applicant: International Business Machines CorporationInventors: Jian Li, William E. Speight, Lixin Zhang
-
Publication number: 20110296115Abstract: A mechanism is provided for assigning memory to on-chip cache coherence domains. The mechanism assigns caches within a processing unit to coherence domains. The mechanism then assigns chunks of memory to the coherence domains. The mechanism monitors applications running on cores within the processing unit to identify needs of the applications. The mechanism may then reassign memory chunks to the cache coherence domains based on the needs of the applications running in the coherence domains. When a memory controller receives the cache miss, the memory controller may look up the address in a lookup table that maps memory chunks to cache coherence domains. Snoop requests are sent to caches within the coherence domain. If a cache line is found in a cache within the coherence domain, the cache line is returned to the originating cache by the cache containing the cache line either directly or through the memory controller.Type: ApplicationFiled: May 26, 2010Publication date: December 1, 2011Applicant: International Business Machines CorporationInventors: William E. Speight, Lixin Zhang
-
Publication number: 20110238946Abstract: A virtual address scheme for improving performance and efficiency of memory accesses of sparsely-stored data items in a cached memory system is disclosed. In a preferred embodiment of the present invention, a special address translation unit is used to translate sets of non-contiguous addresses in real memory into contiguous blocks of addresses in an “intermediate address space.” This intermediate address space is a fictitious or “virtual” address space, but is distinguishable from the virtual address space visible to application programs, and in user-level memory operations, effective addresses seen/manipulated by application programs are translated into intermediate addresses by an additional address translation unit for memory caching purposes. This scheme allows non-contiguous data items in memory to be assembled into contiguous cache lines for more efficient caching/access (due to the perceived spatial proximity of the data from the perspective of the processor).Type: ApplicationFiled: March 24, 2010Publication date: September 29, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Ramakrishnan Rajamony, William E. Speight, Lixin Zhang
-
Patent number: 7996564Abstract: A distributed data processing system executes multiple tasks within a parallel job, including a first local task on a local node and at least one task executing on a remote node, with a remote memory having real address (RA) locations mapped to one or more of the source effective addresses (EA) and destination EA of a data move operation initiated by a task executing on the local node. On initiation of the data move operation, remote asynchronous data move (RADM) logic identifies that the operation moves data to/from a first EA that is memory mapped to an RA of the remote memory. The local processor/RADM logic initiates a RADM operation that moves a copy of the data directly from/to the first remote memory by completing the RADM operation using the network interface cards (NICs) of the source and destination processing nodes, determined by accessing a data center for the node IDs of remote memory.Type: GrantFiled: April 16, 2009Date of Patent: August 9, 2011Assignee: International Business Machines CorporationInventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ronald N. Kalla, Ramakrishnan Rajamony, Balaram Sinharoy, William E. Speight, William J. Starke
-
Publication number: 20110145509Abstract: A technique for performing stream detection and prefetching within a cache memory simplifies stream detection and prefetching. A bit in a cache directory or cache entry indicates that a cache line has not been accessed since being prefetched and another bit indicates the direction of a stream associated with the cache line. A next cache line is prefetched when a previously prefetched cache line is accessed, so that the cache always attempts to prefetch one cache line ahead of accesses, in the direction of a detected stream. Stream detection is performed in response to load misses tracked in the load miss queue (LMQ). The LMQ stores an offset indicating a first miss at the offset within a cache line. A next miss to the line sets a direction bit based on the difference between the first and second offsets and causes prefetch of the next line for the stream.Type: ApplicationFiled: February 9, 2011Publication date: June 16, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: William E. Speight, Lixin Zhang
-
Patent number: 7958183Abstract: A mechanism for performing collective operations. In software executing on a parent processor in a first processor book, a number of other processors are determined in a same or different processor book of the data processing system that is needed to execute the collective operation, thereby establishing a plurality of processors comprising the parent processor and the other processors. In software executing on the parent processor, the plurality of processors are logically arranged as a plurality of nodes in a hierarchical structure. The collective operation is transmitted to the plurality of processors based on the hierarchical structure. In hardware of the parent processor, results are received from the execution of the collective operation from the other processors, a final result is generated of the collective operation based on the received results, and the final result is output.Type: GrantFiled: August 27, 2007Date of Patent: June 7, 2011Assignee: International Business Machines CorporationInventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
-
Patent number: 7958182Abstract: A mechanism is provided for performing collective operations. In hardware of a parent processor in a first processor book, a number of other processors are determined in a same or different processor book of the data processing system that is needed to execute the collective operation, thereby establishing a plurality of processors comprising the parent processor and the other processors. In hardware of the parent processor, the plurality of processors are logically arranged as a plurality of nodes in a hierarchical structure. The collective operation is transmitted to the plurality of processors based on the hierarchical structure. In hardware of the parent processor, results are received from the execution of the collective operation from the other processors, a final result is generated of the collective operation based on the received results, and the final result is output.Type: GrantFiled: August 27, 2007Date of Patent: June 7, 2011Assignee: International Business Machines CorporationInventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
-
Patent number: 7958317Abstract: A technique for performing stream detection and prefetching within a cache memory simplifies stream detection and prefetching. A bit in a cache directory or cache entry indicates that a cache line has not been accessed since being prefetched and another bit indicates the direction of a stream associated with the cache line. A next cache line is prefetched when a previously prefetched cache line is accessed, so that the cache always attempts to prefetch one cache line ahead of accesses, in the direction of a detected stream. Stream detection is performed in response to load misses tracked in the load miss queue (LMQ). The LMQ stores an offset indicating a first miss at the offset within a cache line. A next miss to the line sets a direction bit based on the difference between the first and second offsets and causes prefetch of the next line for the stream.Type: GrantFiled: August 4, 2008Date of Patent: June 7, 2011Assignee: International Business Machines CorporationInventors: William E. Speight, Lixin Zhang
-
Patent number: 7958316Abstract: A method, processor, and data processing system for dynamically adjusting a prefetch stream priority based on the consumption rate of the data by the processor. The method includes a prefetch engine issuing a prefetch request of a first prefetch stream to fetch one or more data from the memory subsystem. The first prefetch stream has a first assigned priority that determines a relative order for scheduling prefetch requests of the first prefetch stream relative to other prefetch requests of other prefetch streams. Based on the receipt of a processor demand for the data before the data returns to the cache or return of the data along time before the receiving the processor demand, logic of the prefetch engine dynamically changes the first assigned priority to a second higher or lower priority, which priority is subsequently utilized to schedule and issue a next prefetch request of the first prefetch stream.Type: GrantFiled: February 1, 2008Date of Patent: June 7, 2011Assignee: International Business Machines CorporationInventors: William E. Speight, Lixin Zhang
-
Publication number: 20110072214Abstract: A mechanism is provided in a cache for providing a read and write aware cache. The mechanism partitions a large cache into a read-often region and a write-often region. The mechanism considers read/write frequency in a non-uniform cache architecture replacement policy. A frequently written cache line is placed in one of the farther banks. A frequently read cache line is placed in one of the closer banks. The size ratio between read-often and write-often regions may be static or dynamic. The boundary between the read-often region and the write-often region may be distinct or fuzzy.Type: ApplicationFiled: September 18, 2009Publication date: March 24, 2011Applicant: International Business Machines CorporationInventors: Jian Li, Ramakrishnan Rajamony, William E. Speight, Lixin Zhang
-
Patent number: 7904590Abstract: A mechanism is provided for routing information through the data processing system. Data is received at a source processor within a set of processors that is to be transmitted to a destination processor, where the data includes address information. A first determination is performed as to whether the destination processor is within a same processor book as the source processor based on the address information. A second determination is performed as to whether the destination processor is within a same supernode as the source processor based on the address information if the destination processor is not within the same processor book. A routing path is identified for the data based on results of the first determination, the second determination, and one or more routing table data structures. The data is then transmitted from the source processor along the identified routing path toward the destination processor.Type: GrantFiled: August 27, 2007Date of Patent: March 8, 2011Assignee: International Business Machines CorporationInventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
-
Publication number: 20110022773Abstract: A mechanism is provided in a virtual machine monitor for fine grained cache allocation in a shared cache. The mechanism partitions a cache tag into a most significant bit (MSB) portion and a least significant bit (LSB) portion. The MSB portion of the tags is shared among the cache lines in a set. The LSB portion of the tags is private, one per cache line. The mechanism allows software to set the MSB portion of tags in a cache to allocate sets of cache lines. The cache controller determines whether a cache line is locked based on the MSB portion of the tag.Type: ApplicationFiled: July 27, 2009Publication date: January 27, 2011Applicant: International Business Machines CorporationInventors: Ramakrishnan Rajamony, William E. Speight, Lixin Zhang
-
Publication number: 20110004875Abstract: A method, a system, an apparatus, and a computer program product for allocating resources of one or more shared devices to one or more partitions of a virtualization environment within a data processing system. At least one user defined resource assignment is received for one or more devices associated with the data processing system. One or more registers, associated with the one or more partitions are dynamically set to execute the at least one resource assignment, whereby the at least one resource assignment enables a user defined quantitative measure (number and/or percentage) of devices to operate when the one or more transactions are executed via the partition. The system enables the one or more devices to execute one or more transactions at a bandwidth/capacity that is less than or equal to the user defined resource assignment and minimizes performance interference among partitions.Type: ApplicationFiled: July 1, 2009Publication date: January 6, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Elmootazbellah N. Elnozahy, Ramakrishnan Rajamony, William E. Speight, Lixin Zhang
-
Publication number: 20100293339Abstract: A method of data processing in a processor includes maintaining a usage history indicating demand usage of prefetched data retrieved into cache memory. An amount of data to prefetch by a data prefetch request is selected based upon the usage history. The data prefetch request is transmitted to a memory hierarchy to prefetch the selected amount of data into cache memory.Type: ApplicationFiled: February 1, 2008Publication date: November 18, 2010Inventors: RAVI K. ARIMILLI, Gheorghe C. Cascaval, Balaram Sinharoy, William E. Speight, Lixin Zhang
-
Publication number: 20100268788Abstract: A distributed data processing system executes multiple tasks within a parallel job, including a first local task on a local node and at least one task executing on a remote node, with a remote memory having real address (RA) locations mapped to one or more of the source effective addresses (EA) and destination EA of a data move operation initiated by a task executing on the local node. On initiation of the data move operation, remote asynchronous data move (RADM) logic identifies that the operation moves data to/from a first EA that is memory mapped to an RA of the remote memory. The local processor/RADM logic initiates a RADM operation that moves a copy of the data directly from/to the first remote memory by completing the RADM operation using the network interface cards (NICs) of the source and destination processing nodes, determined by accessing a data center for the node IDs of remote memory.Type: ApplicationFiled: April 16, 2009Publication date: October 21, 2010Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ronald N. Kalla, Ramakrishnan Rajamony, Balaram Sinharoy, William E. Speight, William J. Starke
-
Patent number: 7783870Abstract: A processor includes an execution unit and instruction sequencing logic that fetches instructions from a memory system for execution. The instruction sequencing logic includes branch logic that outputs predicted branch target addresses for use as instruction fetch addresses. The branch logic includes a level one branch target address cache (BTAC) and a level two BTAC each having a respective plurality of entries each associating at least a tag with a predicted branch target address. The branch logic accesses the level one and level two BTACs in parallel with a tag portion of a first instruction fetch address to obtain a first predicted branch target address from the level one BTAC for use as a second instruction fetch address in a first processor clock cycle and a second predicted branch target address from the level two BTAC for use as a third instruction fetch address in a later second processor clock cycle.Type: GrantFiled: August 13, 2007Date of Patent: August 24, 2010Assignee: International Business Machines CorporationInventors: David S. Levitan, William E. Speight, Lixin Zhang