Patents by Inventor John D. Pape

John D. Pape has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Coprocessor Prefetcher

Publication number: 20240095037

Abstract: A prefetcher for a coprocessor is disclosed. An apparatus includes a processor and a coprocessor that are configured to execute processor and coprocessor instructions, respectively. The processor and coprocessor instructions appear together in code sequences fetched by the processor, with the coprocessor instructions being provided to the coprocessor by the processor. The apparatus further includes a coprocessor prefetcher configured to monitor a code sequence fetched by the processor and, in response to identifying a presence of coprocessor instructions in the code sequence, capture the memory addresses, generated by the processor, of operand data for coprocessor instructions. The coprocessor is further configured to issue, for a cache memory accessible to the coprocessor, prefetches for data associated with the memory addresses prior to execution of the coprocessor instructions by the coprocessor.

Type: Application

Filed: July 28, 2023

Publication date: March 21, 2024

Inventors: Brandon H. Dwiel, Andrew J. Beaumont-Smith, Eric J. Furbish, John D. Pape, Stephen G. Meier, Tyler J. Huberty
Stack pointer instruction buffer for zero-cycle loads

Patent number: 11900118

Abstract: An apparatus includes a rescue buffer circuit, a store queue circuit, and a control circuit. The rescue buffer circuit may be configured to retain address information related to store instructions. The store queue circuit may be configured to buffer dependency information related to a particular store instruction until the particular store instruction is released to be executed. The control circuit may be configured to cause a subset of the dependency information for the particular store instruction to be written to the rescue buffer circuit. The rescue buffer circuit may be configured to retain the subset after the dependency information has been released from the store queue circuit, and to perform a subsequent load instruction corresponding to a memory location associated with the particular store instruction using the subset of the dependency information from the rescue buffer circuit.

Type: Grant

Filed: August 5, 2022

Date of Patent: February 13, 2024

Assignee: Apple Inc.

Inventors: John D. Pape, Francesco Spadini, Zhaoxiang Jin
Coprocessor prefetcher

Patent number: 11755333

Abstract: A prefetcher for a coprocessor is disclosed. An apparatus includes a processor and a coprocessor that are configured to execute processor and coprocessor instructions, respectively. The processor and coprocessor instructions appear together in code sequences fetched by the processor, with the coprocessor instructions being provided to the coprocessor by the processor. The apparatus further includes a coprocessor prefetcher configured to monitor a code sequence fetched by the processor and, in response to identifying a presence of coprocessor instructions in the code sequence, capture the memory addresses, generated by the processor, of operand data for coprocessor instructions. The coprocessor is further configured to issue, for a cache memory accessible to the coprocessor, prefetches for data associated with the memory addresses prior to execution of the coprocessor instructions by the coprocessor.

Type: Grant

Filed: December 10, 2021

Date of Patent: September 12, 2023

Assignee: Apple Inc.

Inventors: Brandon H. Dwiel, Andrew J. Beaumont-Smith, Eric J. Furbish, John D. Pape, Stephen G. Meier, Tyler J. Huberty
Reducing Translation Lookaside Buffer Searches for Splintered Pages

Publication number: 20230236988

Abstract: Systems, apparatuses, and methods for performing efficient translation lookaside buffer (TLB) invalidation operations for splintered pages are described. When a TLB receives an invalidation request for a specified translation context, and the invalidation request maps to an entry with a relatively large page size, the TLB does not know if there are multiple translation entries stored in the TLB for smaller splintered pages of the relatively large page. The TLB tracks whether or not splintered pages for each translation context have been installed. If a TLB invalidate (TLBI) request is received, and splintered pages have not been installed, no searches are needed for splintered pages. To refresh the sticky bits, whenever a full TLB search is performed, the TLB rescans for splintered pages for other translation contexts. If no splintered pages are found, the sticky bit can be cleared and the number of full TLBI searches is reduced.

Type: Application

Filed: March 24, 2023

Publication date: July 27, 2023

Inventors: John D. Pape, Brian R. Mestan, Peter G. Soderquist
Limiting translation lookaside buffer searches using active page size

Patent number: 11675710

Abstract: Systems, apparatuses, and methods for limiting translation lookaside buffer (TLB) searches using active page size are described. A TLB stores virtual-to-physical address translations for a plurality of different page sizes. When the TLB receives a command to invalidate a TLB entry corresponding to a specified virtual address, the TLB performs, for the plurality of different pages sizes, multiple different lookups of the indices corresponding to the specified virtual address. In order to reduce the number of lookups that are performed, the TLB relies on a page size presence vector and an age matrix to determine which page sizes to search for and in which order. The page size presence vector indicates which page sizes may be stored for the specified virtual address. The age matrix stores a preferred search order with the most probable page size first and the least probable page size last.

Type: Grant

Filed: September 9, 2020

Date of Patent: June 13, 2023

Assignee: Apple Inc.

Inventors: John D. Pape, Brian R. Mestan, Peter G. Soderquist
Poison mechanisms for deferred invalidates

Patent number: 11630771

Abstract: An apparatus includes multiple processors including respective cache memories, the cache memories configured to cache cache-entries for use by the processors. At least a processor among the processors includes cache management logic that is configured to (i) receive, from one or more of the other processors, cache-invalidation commands that request invalidation of specified cache-entries in the cache memory of the processor (ii) mark the specified cache-entries as intended for invalidation but defer actual invalidation of the specified cache-entries, and (iii) upon detecting a synchronization event associated with the cache-invalidation commands, invalidate the cache-entries that were marked as intended for invalidation.

Type: Grant

Filed: July 13, 2021

Date of Patent: April 18, 2023

Assignee: APPLE INC.

Inventors: John D Pape, Mahesh K Reddy, Prasanna Utchani Varadharajan, Pruthivi Vuyyuru
Reducing translation lookaside buffer searches for splintered pages

Patent number: 11615033

Abstract: Systems, apparatuses, and methods for performing efficient translation lookaside buffer (TLB) invalidation operations for splintered pages are described. When a TLB receives an invalidation request for a specified translation context, and the invalidation request maps to an entry with a relatively large page size, the TLB does not know if there are multiple translation entries stored in the TLB for smaller splintered pages of the relatively large page. The TLB tracks whether or not splintered pages for each translation context have been installed. If a TLB invalidate (TLBI) request is received, and splintered pages have not been installed, no searches are needed for splintered pages. To refresh the sticky bits, whenever a full TLB search is performed, the TLB rescans for splintered pages for other translation contexts. If no splintered pages are found, the sticky bit can be cleared and the number of full TLBI searches is reduced.

Type: Grant

Filed: September 9, 2020

Date of Patent: March 28, 2023

Assignee: Apple Inc.

Inventors: John D. Pape, Brian R. Mestan, Peter G. Soderquist
Coprocessor Prefetcher

Publication number: 20230092898

Abstract: A prefetcher for a coprocessor is disclosed. An apparatus includes a processor and a coprocessor that are configured to execute processor and coprocessor instructions, respectively. The processor and coprocessor instructions appear together in code sequences fetched by the processor, with the coprocessor instructions being provided to the coprocessor by the processor. The apparatus further includes a coprocessor prefetcher configured to monitor a code sequence fetched by the processor and, in response to identifying a presence of coprocessor instructions in the code sequence, capture the memory addresses, generated by the processor, of operand data for coprocessor instructions. The coprocessor is further configured to issue, for a cache memory accessible to the coprocessor, prefetches for data associated with the memory addresses prior to execution of the coprocessor instructions by the coprocessor.

Type: Application

Filed: December 10, 2021

Publication date: March 23, 2023

Inventors: Brandon H. Dwiel, Andrew J. Beaumont-Smith, Eric J. Furbish, John D. Pape, Stephen G. Meier, Tyler J. Huberty
Poison Mechanisms for Deferred Invalidates

Publication number: 20230017473

Abstract: An apparatus includes multiple processors including respective cache memories, the cache memories configured to cache cache-entries for use by the processors. At least a processor among the processors includes cache management logic that is configured to (i) receive, from one or more of the other processors, cache-invalidation commands that request invalidation of specified cache-entries in the cache memory of the processor (ii) mark the specified cache-entries as intended for invalidation but defer actual invalidation of the specified cache-entries, and (iii) upon detecting a synchronization event associated with the cache-invalidation commands, invalidate the cache-entries that were marked as intended for invalidation.

Type: Application

Filed: July 13, 2021

Publication date: January 19, 2023

Inventors: John D. Pape, Mahesh K. Reddy, Prasanna Utchani Varadharajan, Pruthivi Vuyyuru
Translation lookaside buffer striping for efficient invalidation operations

Patent number: 11422946

Abstract: Systems, apparatuses, and methods for implementing translation lookaside buffer (TLB) striping to enable efficient invalidation operations are described. TLB sizes are growing in width (more features in a given page table entry) and depth (to cover larger memory footprints). A striping scheme is proposed to enable an efficient and high performance method for performing TLB maintenance operations in the face of this growth. Accordingly, a TLB stores first attribute data in a striped manner across a plurality of arrays. The striped manner allows different entries to be searched simultaneously in response to receiving an invalidation request which identifies a particular attribute of a group to be invalidated. Upon receiving an invalidation request, the TLB generates a plurality of indices with an offset between each index and walks through the plurality of arrays by incrementing each index and simultaneously checking the first attribute data in corresponding entries.

Type: Grant

Filed: August 31, 2020

Date of Patent: August 23, 2022

Assignee: Apple Inc.

Inventors: John D. Pape, Brian R. Mestan, Peter G. Soderquist
Reducing Translation Lookaside Buffer Searches for Splintered Pages

Publication number: 20220075734

Abstract: Systems, apparatuses, and methods for performing efficient translation lookaside buffer (TLB) invalidation operations for splintered pages are described. When a TLB receives an invalidation request for a specified translation context, and the invalidation request maps to an entry with a relatively large page size, the TLB does not know if there are multiple translation entries stored in the TLB for smaller splintered pages of the relatively large page. The TLB tracks whether or not splintered pages for each translation context have been installed. If a TLB invalidate (TLBI) request is received, and splintered pages have not been installed, no searches are needed for splintered pages. To refresh the sticky bits, whenever a full TLB search is performed, the TLB rescans for splintered pages for other translation contexts. If no splintered pages are found, the sticky bit can be cleared and the number of full TLBI searches is reduced.

Type: Application

Filed: September 9, 2020

Publication date: March 10, 2022

Inventors: John D. Pape, Brian R. Mestan, Peter G. Soderquist
Limiting Translation Lookaside Buffer Searches Using Active Page Size

Publication number: 20220075735

Abstract: Systems, apparatuses, and methods for limiting translation lookaside buffer (TLB) searches using active page size are described. A TLB stores virtual-to-physical address translations for a plurality of different page sizes. When the TLB receives a command to invalidate a TLB entry corresponding to a specified virtual address, the TLB performs, for the plurality of different pages sizes, multiple different lookups of the indices corresponding to the specified virtual address. In order to reduce the number of lookups that are performed, the TLB relies on a page size presence vector and an age matrix to determine which page sizes to search for and in which order. The page size presence vector indicates which page sizes may be stored for the specified virtual address. The age matrix stores a preferred search order with the most probable page size first and the least probable page size last.

Type: Application

Filed: September 9, 2020

Publication date: March 10, 2022

Inventors: John D. Pape, Brian R. Mestan, Peter G. Soderquist
Translation Lookaside Buffer Striping for Efficient Invalidation Operations

Publication number: 20220066947

Abstract: Systems, apparatuses, and methods for implementing translation lookaside buffer (TLB) striping to enable efficient invalidation operations are described. TLB sizes are growing in width (more features in a given page table entry) and depth (to cover larger memory footprints). A striping scheme is proposed to enable an efficient and high performance method for performing TLB maintenance operations in the face of this growth. Accordingly, a TLB stores first attribute data in a striped manner across a plurality of arrays. The striped manner allows different entries to be searched simultaneously in response to receiving an invalidation request which identifies a particular attribute of a group to be invalidated. Upon receiving an invalidation request, the TLB generates a plurality of indices with an offset between each index and walks through the plurality of arrays by incrementing each index and simultaneously checking the first attribute data in corresponding entries.

Type: Application

Filed: August 31, 2020

Publication date: March 3, 2022

Inventors: John D. Pape, Brian R. Mestan, Peter G. Soderquist
Acceleration and dynamic allocation of random data bandwidth in multi-core processors

Patent number: 10289331

Abstract: Systems and methods for use in enhancing and dynamically allocating random data bandwidth among requesting cores in multi-core processors to reduce system latencies and increase system performance. In one arrangement, a multicore processor includes a vertical pre-fetch random data buffer structure that stores random data being continuously generated by a random data generator (RNG) so that such random data is ready for consumption upon request from one or more of a plurality of processing cores of the multicore processor. Random data received at one data buffer from a higher level buffer may be automatically deposited into the lower level buffer if room exists in the lower level buffer. Requesting strands of a core may fetch random data directly from its corresponding first level pre-fetch buffer on demand rather than having to trigger a PIO access or the like to fetch random data from the RNG.

Type: Grant

Filed: September 26, 2018

Date of Patent: May 14, 2019

Assignee: ORACLE INTERNATIONAL CORPORATION

Inventors: Bruce J. Chang, Fred Tsai, John D. Pape
ACCELERATION AND DYNAMIC ALLOCATION OF RANDOM DATA BANDWIDTH IN MULTI-CORE PROCESSORS

Publication number: 20190026040

Abstract: Systems and methods for use in enhancing and dynamically allocating random data bandwidth among requesting cores in multi-core processors to reduce system latencies and increase system performance. In one arrangement, a multicore processor includes a vertical pre-fetch random data buffer structure that stores random data being continuously generated by a random data generator (RNG) so that such random data is ready for consumption upon request from one or more of a plurality of processing cores of the multicore processor. Random data received at one data buffer from a higher level buffer may be automatically deposited into the lower level buffer if room exists in the lower level buffer. Requesting strands of a core may fetch random data directly from its corresponding first level pre-fetch buffer on demand rather than having to trigger a PIO access or the like to fetch random data from the RNG.

Type: Application

Filed: September 26, 2018

Publication date: January 24, 2019

Inventors: Bruce J. Chang, Fred Tsai, John D. Pape
Acceleration and dynamic allocation of random data bandwidth in multi-core processors

Patent number: 10114572

Abstract: Systems and methods for use in enhancing and dynamically allocating random data bandwidth among requesting cores in multi-core processors to reduce system latencies and increase system performance. In one arrangement, a multicore processor includes a vertical pre-fetch random data buffer structure that stores random data being continuously generated by a random data generator (RNG) so that such random data is ready for consumption upon request from one or more of a plurality of processing cores of the multicore processor. Random data received at one data buffer from a higher level buffer may be automatically deposited into the lower level buffer if room exists in the lower level buffer. Requesting strands of a core may fetch random data directly from its corresponding first level pre-fetch buffer on demand rather than having to trigger a PIO access or the like to fetch random data from the RNG.

Type: Grant

Filed: December 6, 2016

Date of Patent: October 30, 2018

Assignee: Oracle International Corporation

Inventors: Bruce J. Chang, Fred Tsai, John D. Pape
ACCELERATION AND DYNAMIC ALLOCATION OF RANDOM DATA BANDWIDTH IN MULTI-CORE PROCESSORS

Publication number: 20180157435

Abstract: Systems and methods for use in enhancing and dynamically allocating random data bandwidth among requesting cores in multi-core processors to reduce system latencies and increase system performance. In one arrangement, a multicore processor includes a vertical pre-fetch random data buffer structure that stores random data being continuously generated by a random data generator (RNG) so that such random data is ready for consumption upon request from one or more of a plurality of processing cores of the multicore processor. Random data received at one data buffer from a higher level buffer may be automatically deposited into the lower level buffer if room exists in the lower level buffer. Requesting strands of a core may fetch random data directly from its corresponding first level pre-fetch buffer on demand rather than having to trigger a PIO access or the like to fetch random data from the RNG.

Type: Application

Filed: December 6, 2016

Publication date: June 7, 2018

Inventors: Bruce J. Chang, Fred Tsai, John D. Pape
Memory interface device and methods thereof

Patent number: 8949545

Abstract: A data processing device includes a load/store module to provide an interface between a processor device and a bus. In response to receiving a load or store instruction from the processor device, the load/store module determines a predicted coherency state of a cache line associated with the load or store instruction. Based on the predicted coherency state, the load/store module selects a bus transaction and communicates it to the bus. By selecting the bus transaction based on the predicted cache state, the load/store module does not have to wait for all pending bus transactions to be serviced, providing for greater predictability as to when bus transactions will be communicated to the bus, and allowing the bus behavior to be more easily simulated.

Type: Grant

Filed: December 4, 2008

Date of Patent: February 3, 2015

Assignee: Freescale Semiconductor, Inc.

Inventor: John D. Pape
MEMORY INTERFACE DEVICE AND METHODS THEREOF

Publication number: 20100146217

Abstract: A data processing device includes a load/store module to provide an interface between a processor device and a bus. In response to receiving a load or store instruction from the processor device, the load/store module determines a predicted coherency state of a cache line associated with the load or store instruction. Based on the predicted coherency state, the load/store module selects a bus transaction and communicates it to the bus. By selecting the bus transaction based on the predicted cache state, the load/store module does not have to wait for all pending bus transactions to be serviced, providing for greater predictability as to when bus transactions will be communicated to the bus, and allowing the bus behavior to be more easily simulated.

Type: Application

Filed: December 4, 2008

Publication date: June 10, 2010

Applicant: FREESCALE SEMICONDUCTOR, INC.

Inventor: John D. Pape
System and method for concurrently decoding and transmitting a memory request

Patent number: 7516205

Abstract: A system comprises a plurality of nodes coupled together wherein each node has access to associated memory. Further, each node is adapted to transmit a memory request to at least one other node while concurrently decoding the memory request to determine which node contains the memory targeted by the memory request.

Type: Grant

Filed: August 16, 2004

Date of Patent: April 7, 2009

Assignee: Hewlett-Packard Development Company, L.P.

Inventor: John D. Pape