Patents by Inventor Bernard C. Drerup
Bernard C. Drerup has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 8756270Abstract: A mechanism is provided in a collective acceleration unit for performing a collective operation to distribute or collect data among a plurality of participant nodes. The mechanism receives an input collective packet for a collective operation from a neighbor node within a collective tree. The input collective packet comprises a tree identifier and an input data field and wherein the collective tree comprises a plurality of sub trees. The mechanism maps the tree identifier to an index within the collective acceleration unit. The index identifies a portion of resources within the collective acceleration unit and is associated with a set of neighbor nodes in a given sub tree within the collective tree. For each neighbor node the collective acceleration unit stores destination information. The collective acceleration unit performs an operation on the input data field using the portion of resources to effect the collective operation.Type: GrantFiled: April 24, 2012Date of Patent: June 17, 2014Assignee: International Business Machines CorporationInventors: Lakshminarayana B. Arimilli, Bernard C. Drerup, Paul F. Lecocq, Hanhong Xue
-
Patent number: 8751655Abstract: A mechanism is provided in a collective acceleration unit for performing a collective operation to distribute or collect data among a plurality of participant nodes. The mechanism receives an input collective packet for a collective operation from a neighbor node within a collective tree. The input collective packet comprises a tree identifier and an input data field and wherein the collective tree comprises a plurality of sub trees. The mechanism maps the tree identifier to an index within the collective acceleration unit. The index identifies a portion of resources within the collective acceleration unit and is associated with a set of neighbor nodes in a given sub tree within the collective tree. For each neighbor node the collective acceleration unit stores destination information. The collective acceleration unit performs an operation on the input data field using the portion of resources to effect the collective operation.Type: GrantFiled: March 29, 2010Date of Patent: June 10, 2014Assignee: International Business Machines CorporationInventors: Lakshminarayana B. Arimilli, Bernard C. Drerup, Paul F. Lecocq, Hanhong Xue
-
Patent number: 8417778Abstract: A mechanism is provided for collective acceleration unit tree flow control forms a logical tree (sub-network) among those processors and transfers “collective” packets on this tree. The system supports many collective trees, and each collective acceleration unit (CAU) includes resources to support a subset of the trees. Each CAU has limited buffer space, and the connection between two CAUs is not completely reliable. Therefore, to address the challenge of collective packets traversing on the tree without colliding with each other for buffer space and guaranteeing the end-to-end packet delivery, each CAU in the system effectively flow controls the packets, detects packet loss, and retransmits lost packets.Type: GrantFiled: December 17, 2009Date of Patent: April 9, 2013Assignee: International Business Machines CorporationInventors: Lakshminarayana B. Arimilli, Bernard C. Drerup, Jody B. Joyner, Paul F. Lecocq, Hanhong Xue
-
Publication number: 20120296915Abstract: A mechanism is provided in a collective acceleration unit for performing a collective operation to distribute or collect data among a plurality of participant nodes. The mechanism receives an input collective packet for a collective operation from a neighbor node within a collective tree. The input collective packet comprises a tree identifier and an input data field and wherein the collective tree comprises a plurality of sub trees. The mechanism maps the tree identifier to an index within the collective acceleration unit. The index identifies a portion of resources within the collective acceleration unit and is associated with a set of neighbor nodes in a given sub tree within the collective tree. For each neighbor node the collective acceleration unit stores destination information. The collective acceleration unit performs an operation on the input data field using the portion of resources to effect the collective operation.Type: ApplicationFiled: April 24, 2012Publication date: November 22, 2012Applicant: International Business Machines CorporationInventors: Lakshminarayana B. Arimilli, Bernard C. Drerup, Paul F. Lecocq, Hanhong Xue
-
Patent number: 8302109Abstract: A synchronization optimized queuing method and device to minimize software/hardware interaction in network interface hardware during an end-of-initiative process, including network adapter queue implementations for network interface hardware for optimized communication in a computer system. An end-of-initiative procedure to ensure that the network interface hardware has received an interrupt enable and to recheck the interrupt queue is unnecessary in the present invention.Type: GrantFiled: February 24, 2009Date of Patent: October 30, 2012Assignee: International Business Machines CorporationInventors: Lakshminarayana Arimilli, Claude Basso, Piyush Chaudhary, Bernard C. Drerup, Jody B. Joyner, Jan-Bernd Themann, Christoph Raisch, Colin B. Verrilli
-
Patent number: 8077602Abstract: Mechanisms for performing dynamic request routing based on broadcast depth queue information are provided. Each processor chip in the system may use a synchronized heartbeat signal it generates to provide queue depth information to each of the other processor chips in the system. The queue depth information identifies a number of requests or amount of data in each of the queues of a processor chip that originated the heartbeat signal. The queue depth information from each of the processor chips in the system may be used by the processor chips in determining optimal routing paths for data from a source processor chip to a destination processor chip. As a result, the congestion of data for processing at each of the processor chips along each possible routing path may be taken into account when selecting to which processor chip to forward data.Type: GrantFiled: February 1, 2008Date of Patent: December 13, 2011Assignee: International Business Machines CorporationInventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Bernard C. Drerup, Jody B. Joyner, Jerry D. Lewis
-
Publication number: 20110238956Abstract: A mechanism is provided in a collective acceleration unit for performing a collective operation to distribute or collect data among a plurality of participant nodes. The mechanism receives an input collective packet for a collective operation from a neighbor node within a collective tree. The input collective packet comprises a tree identifier and an input data field and wherein the collective tree comprises a plurality of sub trees. The mechanism maps the tree identifier to an index within the collective acceleration unit. The index identifies a portion of resources within the collective acceleration unit and is associated with a set of neighbor nodes in a given sub tree within the collective tree. For each neighbor node the collective acceleration unit stores destination information. The collective acceleration unit performs an operation on the input data field using the portion of resources to effect the collective operation.Type: ApplicationFiled: March 29, 2010Publication date: September 29, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Lakshminarayana B. Arimilli, Bernard C. Drerup, Paul F. Lecocq, Hanhong Xue
-
Patent number: 7987437Abstract: A design structure for piggybacking multiple data tenures on a single data bus grant to achieve higher bus utilization is disclosed. In one embodiment of the design structure, a method in a computer-aided design system includes a source device sending a request for a bus grant to deliver data to a data bus connecting a source device and a destination device. The device receives the bus grant and logic within the device determines whether the bandwidth of the data bus allocated to the bus grant will be filled by the data. If the bandwidth of the data bus allocated to the bus grant will not be filled by the data, the device appends additional data to the first data and delivers the combined data to the data bus during the bus grant for the first data. When the bandwidth of the data bus allocated to the bus grant will be filled by the first data, the device delivers only the first data to the data bus during the bus grant.Type: GrantFiled: April 30, 2008Date of Patent: July 26, 2011Assignee: International Business Machines CorporationInventors: Bernard C. Drerup, Richard Nicholas
-
Publication number: 20110173258Abstract: A mechanism is provided for collective acceleration unit tree flow control forms a logical tree (sub-network) among those processors and transfers “collective” packets on this tree. The system supports many collective trees, and each collective acceleration unit (CAU) includes resources to support a subset of the trees. Each CAU has limited buffer space, and the connection between two CAUs is not completely reliable. Therefore, to address the challenge of collective packets traversing on the tree without colliding with each other for buffer space and guaranteeing the end-to-end packet delivery, each CAU in the system effectively flow controls the packets, detects packet loss, and retransmits lost packets.Type: ApplicationFiled: December 17, 2009Publication date: July 14, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Lakshminarayana B. Arimilli, Bernard C. Drerup, Jody B. Joyner, Paul F. Lecocq, Hanhong Xue
-
Patent number: 7921316Abstract: Mechanisms for providing a cluster-wide system clock in a multi-tiered full graph (MTFG) interconnect architecture are provided. Heartbeat signals transmitted by each of the processor chips in the computing cluster are synchronized. Internal system clock signals are generated in each of the processor chips based on the synchronized heartbeat signals. As a result, the internal system clock signals of each of the processor chips are synchronized since the heartbeat signals, that are the basis for the internal system clock signals, are synchronized. Mechanisms are provided for performing such synchronization using direct couplings of processor chips within the same processor book, different processor books in the same supernode, and different processor books in different supernodes of the MTFG interconnect architecture.Type: GrantFiled: September 11, 2007Date of Patent: April 5, 2011Assignee: International Business Machines CorporationInventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Bernard C. Drerup, Jody B. Joyner, Jerry D. Lewis
-
Patent number: 7827428Abstract: A system for providing a cluster-wide system clock in a multi-tiered full graph (MTFG) interconnect architecture are provided. Heartbeat signals transmitted by each of the processor chips in the computing cluster are synchronized. Internal system clock signals are generated in each of the processor chips based on the synchronized heartbeat signals. As a result, the internal system clock signals of each of the processor chips are synchronized since the heartbeat signals, that are the basis for the internal system clock signals, are synchronized. Mechanisms are provided for performing such synchronization using direct couplings of processor chips within the same processor book, different processor books in the same supernode, and different processor books in different supernodes of the MTFG interconnect architecture.Type: GrantFiled: August 31, 2007Date of Patent: November 2, 2010Assignee: International Business Machines CorporationInventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Bernard C. Drerup, Jody B. Joyner, Jerry D. Lewis
-
Patent number: 7779148Abstract: A mechanism for performing dynamic request routing based on broadcast source request information is provided. Each processor chip in the system may use a synchronized heartbeat signal it generates to provide source request information to each of the other processor chips in the system. The source request information identifies the number of active source requests sent by the processor chip that originated the heartbeat signal. The source request information from each of the processor chips in the system may be used by the processor chips in determining optimal routing paths for data from a source processor chip to a destination processor chip. As a result, the congestion of data for processing at each of the processor chips along each possible routing path may be taken into account when selecting to which processor chip to forward data.Type: GrantFiled: February 1, 2008Date of Patent: August 17, 2010Assignee: International Business Machines CorporationInventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Bernard C. Drerup, Jody B. Joyner, Jerry D. Lewis
-
Patent number: 7685373Abstract: A system and structure for snooping cache memories of several snooping masters connected to a bus macro, wherein each non-originating snooping master has a cache memory, and wherein some, but less than all the cache memories, may have the data requested by an originating snooping master and wherein the needed data in an non-originating snooping master is marked as updated, and wherein a main memory having addresses for all data is connected to the bus macro. Only those non-originating snooping masters which may have the requested data are queried. All the non-originating snooping masters that have been queried reply. If a non-originating snooping master has the requested data marked as updated, that non-originating snooping master returns the updated data to the originating snooping master and possibly to the main memory. If none of the non-originating snooping masters has the requested data marked as updated, then the requested data is read from main memory.Type: GrantFiled: January 8, 2008Date of Patent: March 23, 2010Assignee: International Business Machines CorporationInventors: James N. Dieffenderfer, Bernard C. Drerup, Jaya P. Ganasan, Richard G. Hofmann, Thomas A. Sartorius, Thomas P. Speier, Barry J. Wolford
-
Patent number: 7668996Abstract: An improved method, device and data processing system are presented. In one embodiment, the method includes a source device sending a request for a bus grant to deliver data to a data bus connecting a source device and a destination device. The device receives the bus grant and logic within the device determines whether the bandwidth of the data bus allocated to the bus grant will be filled by the data. If the bandwidth of the data bus allocated to the bus grant will not be filled by the data, the device appends additional data to the first data and delivers the combined data to the data bus during the bus grant for the first data. When the bandwidth of the data bus allocated to the bus grant will be filled by the first data, the device delivers only the first data to the data bus during the bus grant.Type: GrantFiled: October 23, 2007Date of Patent: February 23, 2010Assignee: International Business Machines CorporationInventors: Bernard C. Drerup, Richard Nicholas
-
Patent number: 7620749Abstract: A DMA device prefetches descriptors into a descriptor prefetch buffer. The size of descriptor prefetch buffer holds an appropriate number of descriptors for a given latency environment. To support a linked list of descriptors, the DMA engine prefetches descriptors based on the assumption that they are sequential in memory and discards any descriptors that are found to violate this assumption. The DMA engine seeks to keep the descriptor prefetch buffer full by requesting multiple descriptors per transaction whenever possible. The bus engine fetches these descriptors from system memory and writes them to the prefetch buffer. The DMA engine may also use an aggressive prefetch where the bus engine requests the maximum number of descriptors that the buffer will support whenever there is any space in the descriptor prefetch buffer. The DMA device discards any remaining descriptors that cannot be stored.Type: GrantFiled: January 10, 2007Date of Patent: November 17, 2009Assignee: International Business Machines CorporationInventors: Giora Biran, Luis E. De la Torre, Bernard C. Drerup, Jyoti Gupta, Richard Nicholas
-
Patent number: 7603490Abstract: A direct memory access (DMA) device includes a barrier and interrupt mechanism that allows interrupt and mailbox operations to occur in such a way that ensures correct operation, but still allows for high performance out-of-order data moves to occur whenever possible. Certain descriptors are defined to be “barrier descriptors.” When the DMA device encounters a barrier descriptor, it ensures that all of the previous descriptors complete before the barrier descriptor completes. The DMA device further ensures that any interrupt generated by a barrier descriptor will not assert until the data move associated with the barrier descriptor completes. The DMA controller only permits interrupts to be generated by barrier descriptors. The barrier descriptor concept also allows software to embed mailbox completion messages into the scatter/gather linked list of descriptors.Type: GrantFiled: January 10, 2007Date of Patent: October 13, 2009Assignee: International Business Machines CorporationInventors: Giora Biran, Luis E. De la Torre, Bernard C. Drerup, Jyoti Gupta, Richard Nicholas
-
Publication number: 20090198957Abstract: A system and method for performing dynamic request routing based on broadcast depth queue information are provided. Each processor chip in the system may use a synchronized heartbeat signal it generates to provide queue depth information to each of the other processor chips in the system. The queue depth information identifies a number of requests or amount of data in each of the queues of a processor chip that originated the heartbeat signal. The queue depth information from each of the processor chips in the system may be used by the processor chips in determining optimal routing paths for data from a source processor chip to a destination processor chip. As a result, the congestion of data for processing at each of the processor chips along each possible routing path may be taken into account when selecting to which processor chip to forward data.Type: ApplicationFiled: February 1, 2008Publication date: August 6, 2009Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Bernard C. Drerup, Jody B. Joyner, Jerry D. Lewis
-
Publication number: 20090198958Abstract: A system and method for performing dynamic request routing based on broadcast source request information are provided. Each processor chip in the system may use a synchronized heartbeat signal it generates to provide source request information to each of the other processor chips in the system. The source request information identifies the number of active source requests sent by the processor chip that originated the heartbeat signal. The source request information from each of the processor chips in the system may be used by the processor chips in determining optimal routing paths for data from a source processor chip to a destination processor chip. As a result, the congestion of data for processing at each of the processor chips along each possible routing path may be taken into account when selecting to which processor chip to forward data.Type: ApplicationFiled: February 1, 2008Publication date: August 6, 2009Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Bernard C. Drerup, Jody B. Joyner, Jerry D. Lewis
-
Publication number: 20090106466Abstract: A design structure for piggybacking multiple data tenures on a single data bus grant to achieve higher bus utilization is disclosed. In one embodiment of the design structure, a method in a computer-aided design system includes a source device sending a request for a bus grant to deliver data to a data bus connecting a source device and a destination device. The device receives the bus grant and logic within the device determines whether the bandwidth of the data bus allocated to the bus grant will be filled by the data. If the bandwidth of the data bus allocated to the bus grant will not be filled by the data, the device appends additional data to the first data and delivers the combined data to the data bus during the bus grant for the first data. When the bandwidth of the data bus allocated to the bus grant will be filled by the first data, the device delivers only the first data to the data bus during the bus grant.Type: ApplicationFiled: April 30, 2008Publication date: April 23, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: BERNARD C. DRERUP, Richard Nicholas
-
Publication number: 20090106465Abstract: An improved method, device and data processing system are presented. In one embodiment, the method includes a source device sending a request for a bus grant to deliver data to a data bus connecting a source device and a destination device. The device receives the bus grant and logic within the device determines whether the bandwidth of the data bus allocated to the bus grant will be filled by the data. If the bandwidth of the data bus allocated to the bus grant will not be filled by the data, the device appends additional data to the first data and delivers the combined data to the data bus during the bus grant for the first data. When the bandwidth of the data bus allocated to the bus grant will be filled by the first data, the device delivers only the first data to the data bus during the bus grant.Type: ApplicationFiled: October 23, 2007Publication date: April 23, 2009Inventors: Bernard C. Drerup, Richard Nicholas