Patents by Inventor Peter J. Bannon
Peter J. Bannon has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 9354886Abstract: A processor and method for maintaining the integrity of an execution return address stack (RAS). The execution RAS is maintained in an accurate state by storing information regarding branch instructions in a branch information table. The first time a branch instruction is executed, an entry is allocated and populated in the table. If the branch instruction is re-executed, a pointer address is retrieved from the corresponding table entry and the execution RAS pointer is repositioned to the retrieved pointer address. The execution RAS can also be used to restore a speculative RAS due to a mis-speculation.Type: GrantFiled: November 28, 2011Date of Patent: May 31, 2016Assignee: Apple Inc.Inventors: Ramesh B. Gunna, Peter J. Bannon, Andrew J. Beaumont-Smith
-
Patent number: 9280352Abstract: An apparatus and method for avoiding bubbles and maintaining a maximum instruction throughput rate when cracking microcode instructions. A lookahead pointer scans the newest entries of a dispatch queue for microcode instructions. A detected microcode instruction is conveyed to a microcode engine to be cracked into a sequence of micro-ops. Then, the sequence of micro-ops is placed in a queue, and when the original microcode instruction entry in the dispatch queue is selected for dispatch, the sequence of micro-ops is dispatched to the next stage of the processor pipeline.Type: GrantFiled: November 30, 2011Date of Patent: March 8, 2016Assignee: Apple Inc.Inventors: Ramesh B. Gunna, Peter J. Bannon, Rajat Goel
-
Patent number: 9009451Abstract: A system and method for reducing power consumption through issue throttling of selected problematic instructions. A power throttle unit within a processor maintains instruction issue counts for associated instruction types. The instruction types may be a subset of supported instruction types executed by an execution core within the processor. The instruction types may be chosen based on high power consumption estimates for processing instructions of these types. The power throttle unit may determine a given instruction issue count exceeds a given threshold. In response, the power throttle unit may select given instruction types to limit a respective issue rate. The power throttle unit may choose an issue rate for each one of the selected given instruction types and limit an associated issue rate to a chosen issue rate. The selection of given instruction types and associated issue rate limits is programmable.Type: GrantFiled: October 31, 2011Date of Patent: April 14, 2015Assignee: Apple Inc.Inventors: Daniel C. Murray, Andrew J. Beaumont-Smith, John H. Mylius, Peter J. Bannon, Toshi Takayanagi, Jung Wook Cho
-
Patent number: 8566528Abstract: In an embodiment, a combining write buffer is configured to maintain one or more flush metrics to determine when to transmit write operations from buffer entries. The combining write buffer may be configured to dynamically modify the flush metrics in response to activity in the write buffer, modifying the conditions under which write operations are transmitted from the write buffer to the next lower level of memory. For example, in one implementation, the flush metrics may include categorizing write buffer entries as “collapsed.” A collapsed write buffer entry, and the collapsed write operations therein, may include at least one write operation that has overwritten data that was written by a previous write operation in the buffer entry. In another implementation, the combining write buffer may maintain the threshold of buffer fullness as a flush metric and may adjust it over time based on the actual buffer fullness.Type: GrantFiled: December 10, 2012Date of Patent: October 22, 2013Assignee: Apple Inc.Inventors: Peter J. Bannon, Andrew J. Beaumont-Smith, Ramesh B. Gunna, Wei-han Lien, Brian P. Lilly, Jaidev P. Patwardhan, Shih-Chieh R. Wen, Tse-Yu Yeh
-
Publication number: 20130138924Abstract: An apparatus and method for avoiding bubbles and maintaining a maximum instruction throughput rate when cracking microcode instructions. A lookahead pointer scans the newest entries of a dispatch queue for microcode instructions. A detected microcode instruction is conveyed to a microcode engine to be cracked into a sequence of micro-ops. Then, the sequence of micro-ops is placed in a queue, and when the original microcode instruction entry in the dispatch queue is selected for dispatch, the sequence of micro-ops is dispatched to the next stage of the processor pipeline.Type: ApplicationFiled: November 30, 2011Publication date: May 30, 2013Inventors: Ramesh B. Gunna, Peter J. Bannon, Rajat Goel
-
Publication number: 20130138931Abstract: A processor and method for maintaining the integrity of an execution return address stack (RAS). The execution RAS is maintained in an accurate state by storing information regarding branch instructions in a branch information table. The first time a branch instruction is executed, an entry is allocated and populated in the table. If the branch instruction is re-executed, a pointer address is retrieved from the corresponding table entry and the execution RAS pointer is repositioned to the retrieved pointer address. The execution RAS can also be used to restore a speculative RAS due to a mis-speculation.Type: ApplicationFiled: November 28, 2011Publication date: May 30, 2013Inventors: Ramesh B. Gunna, Peter J. Bannon, Andrew J. Beaumont-Smith
-
Publication number: 20130111191Abstract: A system and method for reducing power consumption through issue throttling of selected problematic instructions. A power throttle unit within a processor maintains instruction issue counts for associated instruction types. The instruction types may be a subset of supported instruction types executed by an execution core within the processor. The instruction types may be chosen based on high power consumption estimates for processing instructions of these types. The power throttle unit may determine a given instruction issue count exceeds a given threshold. In response, the power throttle unit may select given instruction types to limit a respective issue rate. The power throttle unit may choose an issue rate for each one of the selected given instruction types and limit an associated issue rate to a chosen issue rate. The selection of given instruction types and associated issue rate limits is programmable.Type: ApplicationFiled: October 31, 2011Publication date: May 2, 2013Inventors: Daniel C. Murray, Andrew J. Beaumont-Smith, John H. Mylius, Peter J. Bannon, Toshi Takayanagi, Jung Wook Cho
-
Patent number: 8364936Abstract: In an embodiment, a scheduler implements a first dependency array that tracks dependencies on instruction operations (ops) within a distance N of a given op and which are short execution latency ops. Other dependencies are tracked in a second dependency array. The first dependency array may evaluate quickly, to support back-to-back issuance of short execution latency ops and their dependent ops. The second array may evaluate more slowly than the first dependency array.Type: GrantFiled: July 25, 2012Date of Patent: January 29, 2013Assignee: Apple Inc.Inventors: Andrew J. Beaumont-Smith, Honkai Tam, Daniel C. Murray, John H. Mylius, Peter J. Bannon, Pradeep Kanapathipillai
-
Patent number: 8352685Abstract: In an embodiment, a combining write buffer is configured to maintain one or more flush metrics to determine when to transmit write operations from buffer entries. The combining write buffer may be configured to dynamically modify the flush metrics in response to activity in the write buffer, modifying the conditions under which write operations are transmitted from the write buffer to the next lower level of memory. For example, in one implementation, the flush metrics may include categorizing write buffer entries as “collapsed.” A collapsed write buffer entry, and the collapsed write operations therein, may include at least one write operation that has overwritten data that was written by a previous write operation in the buffer entry. In another implementation, the combining write buffer may maintain the threshold of buffer fullness as a flush metric and may adjust it over time based on the actual buffer fullness.Type: GrantFiled: August 20, 2010Date of Patent: January 8, 2013Assignee: Apple Inc.Inventors: Peter J. Bannon, Andrew J. Beaumont-Smith, Ramesh Gunna, Wei-han Lien, Brian P. Lilly, Jaidev P. Patwardhan, Shih-Chieh R. Wen, Tse-Yu Yeh
-
Publication number: 20120290818Abstract: In an embodiment, a scheduler implements a first dependency array that tracks dependencies on instruction operations (ops) within a distance N of a given op and which are short execution latency ops. Other dependencies are tracked in a second dependency array. The first dependency array may evaluate quickly, to support back-to-back issuance of short execution latency ops and their dependent ops. The second array may evaluate more slowly than the first dependency array.Type: ApplicationFiled: July 25, 2012Publication date: November 15, 2012Inventors: Andrew J. Beaumont-Smith, Honkai Tam, Daniel C. Murray, John H. Mylius, Peter J. Bannon, Pradeep Kanapathipillai
-
Patent number: 8301843Abstract: In one embodiment, a processor comprises a core configured to execute a data cache block write instruction and an interface unit coupled to the core and to an interconnect on which the processor is configured to communicate. The core is configured to transmit a request to the interface unit in response to the data cache block write instruction. If the request is speculative, the interface unit is configured to issue a first transaction on the interconnect. On the other hand, if the request is non-speculative, the interface unit is configured to issue a second transaction on the interconnect. The second transaction is different from the first transaction. For example, the second transaction may be an invalidate transaction and the first transaction may be a probe transaction. In some embodiments, the processor may be in a system including the interconnect and one or more caching agents.Type: GrantFiled: December 30, 2009Date of Patent: October 30, 2012Assignee: Apple Inc.Inventors: Ramesh Gunna, Sudarshan Kadambi, Peter J. Bannon
-
Patent number: 8285937Abstract: In an embodiment, a processor may be configured to detect a store exclusive operation followed by a memory barrier operation in a speculative instruction stream being executed by the processor. The processor may fuse the store exclusive operation and the memory barrier operation, creating a fused operation. The fused operation may be transmitted and globally ordered, and the processor may complete both the store exclusive operation and the memory barrier operation in response to the fused operation. As the fused operation progresses through the processor and one or more other components (e.g. caches in the cache hierarchy) to the ordering point in the system, the fused operation may push previous memory operations to effect the memory barrier operation. In some embodiments, the latency for completing the store exclusive operation and the subsequent data memory barrier operation may be reduced if the store exclusive operation is successful at the ordering point.Type: GrantFiled: February 24, 2010Date of Patent: October 9, 2012Assignee: Apple Inc.Inventors: Peter J. Bannon, Po-Yung Chang
-
Patent number: 8255671Abstract: In an embodiment, a scheduler implements a first dependency array that tracks dependencies on instruction operations (ops) within a distance N of a given op and which are short execution latency ops. Other dependencies are tracked in a second dependency array. The first dependency array may evaluate quickly, to support back-to-back issuance of short execution latency ops and their dependent ops. The second array may evaluate more slowly than the first dependency array.Type: GrantFiled: December 18, 2008Date of Patent: August 28, 2012Assignee: Apple Inc.Inventors: Andrew J. Beaumont-Smith, Honkai Tam, Daniel C. Murray, John H. Mylius, Peter J. Bannon, Pradeep Kanapathipillai
-
Publication number: 20120047332Abstract: In an embodiment, a combining write buffer is configured to maintain one or more flush metrics to determine when to transmit write operations from buffer entries. The combining write buffer may be configured to dynamically modify the flush metrics in response to activity in the write buffer, modifying the conditions under which write operations are transmitted from the write buffer to the next lower level of memory. For example, in one implementation, the flush metrics may include categorizing write buffer entries as “collapsed.” A collapsed write buffer entry, and the collapsed write operations therein, may include at least one write operation that has overwritten data that was written by a previous write operation in the buffer entry. In another implementation, the combining write buffer may maintain the threshold of buffer fullness as a flush metric and may adjust it over time based on the actual buffer fullness.Type: ApplicationFiled: August 20, 2010Publication date: February 23, 2012Inventors: Peter J. Bannon, Andrew J. Beaumont-Smith, Ramesh Gunna, Wei-han Lien, Brian P. Lilly, Jaidev P. Patwardhan, Shih-Chieh R. Wen, Tse-Yu Yeh
-
Publication number: 20110208915Abstract: In an embodiment, a processor may be configured to detect a store exclusive operation followed by a memory barrier operation in a speculative instruction stream being executed by the processor. The processor may fuse the store exclusive operation and the memory barrier operation, creating a fused operation. The fused operation may be transmitted and globally ordered, and the processor may complete both the store exclusive operation and the memory barrier operation in response to the fused operation. As the fused operation progresses through the processor and one or more other components (e.g. caches in the cache hierarchy) to the ordering point in the system, the fused operation may push previous memory operations to effect the memory barrier operation. In some embodiments, the latency for completing the store exclusive operation and the subsequent data memory barrier operation may be reduced if the store exclusive operation is successful at the ordering point.Type: ApplicationFiled: February 24, 2010Publication date: August 25, 2011Inventors: Peter J. Bannon, Po-Yung Chang
-
Publication number: 20100162262Abstract: In an embodiment, a scheduler implements a first dependency array that tracks dependencies on instruction operations (ops) within a distance N of a given op and which are short execution latency ops. Other dependencies are tracked in a second dependency array. The first dependency array may evaluate quickly, to support back-to-back issuance of short execution latency ops and their dependent ops. The second array may evaluate more slowly than the first dependency array.Type: ApplicationFiled: December 18, 2008Publication date: June 24, 2010Inventors: Andrew J. Beaumont-Smith, Honkai Tam, Daniel C. Murray, John H. Mylius, Peter J. Bannon, Pradeep Kanapathipillai
-
Publication number: 20100106916Abstract: In one embodiment, a processor comprises a core configured to execute a data cache block write instruction and an interface unit coupled to the core and to an interconnect on which the processor is configured to communicate. The core is configured to transmit a request to the interface unit in response to the data cache block write instruction. If the request is speculative, the interface unit is configured to issue a first transaction on the interconnect. On the other hand, if the request is non-speculative, the interface unit is configured to issue a second transaction on the interconnect. The second transaction is different from the first transaction. For example, the second transaction may be an invalidate transaction and the first transaction may be a probe transaction. In some embodiments, the processor may be in a system including the interconnect and one or more caching agents.Type: ApplicationFiled: December 30, 2009Publication date: April 29, 2010Inventors: Ramesh Gunna, Sudarshan Kadambi, Peter J. Bannon
-
Patent number: 7707361Abstract: In one embodiment, a processor comprises a core configured to execute a data cache block write instruction and an interface unit coupled to the core and to an interconnect on which the processor is configured to communicate. The core is configured to transmit a request to the interface unit in response to the data cache block write instruction. If the request is speculative, the interface unit is configured to issue a first transaction on the interconnect. On the other hand, if the request is non-speculative, the interface unit is configured to issue a second transaction on the interconnect. The second transaction is different from the first transaction. For example, the second transaction may be an invalidate transaction and the first transaction may be a probe transaction. In some embodiments, the processor may be in a system including the interconnect and one or more caching agents.Type: GrantFiled: November 17, 2005Date of Patent: April 27, 2010Assignee: Apple Inc.Inventors: Ramesh Gunna, Sudarshan Kadambi, Peter J. Bannon
-
Patent number: 7152191Abstract: A multi-processor computer system permits various types of partitions to be implemented to contain and isolate hardware failures. The various types of partitions include hard, semi-hard, firm, and soft partitions. Each partition can include one or more processors. Upon detecting a failure associated with a processor, the connection to adjacent processors in the system can be severed, thereby precluding corrupted data from contaminating the rest of the system. If an inter-processor connection is severed, message traffic in the system can become congested as messages become backed up in other processors. Accordingly, each processor includes various timers to monitor for traffic congestion that may be due to a severed connection. Rather than letting the processor continue to wait to be able to transmit its messages, the timers will expire at preprogrammed time periods and the processor will take appropriate action, such as simply dropping queued messages, to keep the system from locking up.Type: GrantFiled: October 23, 2003Date of Patent: December 19, 2006Assignee: Hewlett-Packard Development Company, L.P.Inventors: Richard E. Kessler, Peter J. Bannon, Kourosh Gharachorloo, Thukalan V. Verghese
-
Patent number: 7024533Abstract: A computer system has a memory controller that includes read buffers coupled to a plurality of memory channels. The memory controller advantageously eliminates the inter-channel skew caused by memory modules being located at different distances from the memory controller. The memory controller preferably includes a channel interface and synchronization logic circuit for each memory channel. This circuit includes read and write buffers and load and unload pointers for the read buffer. Unload pointer logic generates the unload pointer and load pointer logic generates the load pointer. The pointers preferably are free-running pointers that increment in accordance with two different clock signals. The load pointer increments in accordance with a clock generated by the memory controller but that has been routed out to and back from the memory modules.Type: GrantFiled: May 20, 2003Date of Patent: April 4, 2006Assignee: Hewlett-Packard Development Company, L.P.Inventors: Richard E. Kessler, Peter J. Bannon, Maurice B. Steinman, Scott E. Breach, Allen J. Baum, Gregg A. Bouchard