Patents by Inventor John D. Pape

John D. Pape has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240095037
    Abstract: A prefetcher for a coprocessor is disclosed. An apparatus includes a processor and a coprocessor that are configured to execute processor and coprocessor instructions, respectively. The processor and coprocessor instructions appear together in code sequences fetched by the processor, with the coprocessor instructions being provided to the coprocessor by the processor. The apparatus further includes a coprocessor prefetcher configured to monitor a code sequence fetched by the processor and, in response to identifying a presence of coprocessor instructions in the code sequence, capture the memory addresses, generated by the processor, of operand data for coprocessor instructions. The coprocessor is further configured to issue, for a cache memory accessible to the coprocessor, prefetches for data associated with the memory addresses prior to execution of the coprocessor instructions by the coprocessor.
    Type: Application
    Filed: July 28, 2023
    Publication date: March 21, 2024
    Inventors: Brandon H. Dwiel, Andrew J. Beaumont-Smith, Eric J. Furbish, John D. Pape, Stephen G. Meier, Tyler J. Huberty
  • Patent number: 11900118
    Abstract: An apparatus includes a rescue buffer circuit, a store queue circuit, and a control circuit. The rescue buffer circuit may be configured to retain address information related to store instructions. The store queue circuit may be configured to buffer dependency information related to a particular store instruction until the particular store instruction is released to be executed. The control circuit may be configured to cause a subset of the dependency information for the particular store instruction to be written to the rescue buffer circuit. The rescue buffer circuit may be configured to retain the subset after the dependency information has been released from the store queue circuit, and to perform a subsequent load instruction corresponding to a memory location associated with the particular store instruction using the subset of the dependency information from the rescue buffer circuit.
    Type: Grant
    Filed: August 5, 2022
    Date of Patent: February 13, 2024
    Assignee: Apple Inc.
    Inventors: John D. Pape, Francesco Spadini, Zhaoxiang Jin
  • Patent number: 11755333
    Abstract: A prefetcher for a coprocessor is disclosed. An apparatus includes a processor and a coprocessor that are configured to execute processor and coprocessor instructions, respectively. The processor and coprocessor instructions appear together in code sequences fetched by the processor, with the coprocessor instructions being provided to the coprocessor by the processor. The apparatus further includes a coprocessor prefetcher configured to monitor a code sequence fetched by the processor and, in response to identifying a presence of coprocessor instructions in the code sequence, capture the memory addresses, generated by the processor, of operand data for coprocessor instructions. The coprocessor is further configured to issue, for a cache memory accessible to the coprocessor, prefetches for data associated with the memory addresses prior to execution of the coprocessor instructions by the coprocessor.
    Type: Grant
    Filed: December 10, 2021
    Date of Patent: September 12, 2023
    Assignee: Apple Inc.
    Inventors: Brandon H. Dwiel, Andrew J. Beaumont-Smith, Eric J. Furbish, John D. Pape, Stephen G. Meier, Tyler J. Huberty
  • Publication number: 20230236988
    Abstract: Systems, apparatuses, and methods for performing efficient translation lookaside buffer (TLB) invalidation operations for splintered pages are described. When a TLB receives an invalidation request for a specified translation context, and the invalidation request maps to an entry with a relatively large page size, the TLB does not know if there are multiple translation entries stored in the TLB for smaller splintered pages of the relatively large page. The TLB tracks whether or not splintered pages for each translation context have been installed. If a TLB invalidate (TLBI) request is received, and splintered pages have not been installed, no searches are needed for splintered pages. To refresh the sticky bits, whenever a full TLB search is performed, the TLB rescans for splintered pages for other translation contexts. If no splintered pages are found, the sticky bit can be cleared and the number of full TLBI searches is reduced.
    Type: Application
    Filed: March 24, 2023
    Publication date: July 27, 2023
    Inventors: John D. Pape, Brian R. Mestan, Peter G. Soderquist
  • Patent number: 11675710
    Abstract: Systems, apparatuses, and methods for limiting translation lookaside buffer (TLB) searches using active page size are described. A TLB stores virtual-to-physical address translations for a plurality of different page sizes. When the TLB receives a command to invalidate a TLB entry corresponding to a specified virtual address, the TLB performs, for the plurality of different pages sizes, multiple different lookups of the indices corresponding to the specified virtual address. In order to reduce the number of lookups that are performed, the TLB relies on a page size presence vector and an age matrix to determine which page sizes to search for and in which order. The page size presence vector indicates which page sizes may be stored for the specified virtual address. The age matrix stores a preferred search order with the most probable page size first and the least probable page size last.
    Type: Grant
    Filed: September 9, 2020
    Date of Patent: June 13, 2023
    Assignee: Apple Inc.
    Inventors: John D. Pape, Brian R. Mestan, Peter G. Soderquist
  • Patent number: 11630771
    Abstract: An apparatus includes multiple processors including respective cache memories, the cache memories configured to cache cache-entries for use by the processors. At least a processor among the processors includes cache management logic that is configured to (i) receive, from one or more of the other processors, cache-invalidation commands that request invalidation of specified cache-entries in the cache memory of the processor (ii) mark the specified cache-entries as intended for invalidation but defer actual invalidation of the specified cache-entries, and (iii) upon detecting a synchronization event associated with the cache-invalidation commands, invalidate the cache-entries that were marked as intended for invalidation.
    Type: Grant
    Filed: July 13, 2021
    Date of Patent: April 18, 2023
    Assignee: APPLE INC.
    Inventors: John D Pape, Mahesh K Reddy, Prasanna Utchani Varadharajan, Pruthivi Vuyyuru
  • Patent number: 11615033
    Abstract: Systems, apparatuses, and methods for performing efficient translation lookaside buffer (TLB) invalidation operations for splintered pages are described. When a TLB receives an invalidation request for a specified translation context, and the invalidation request maps to an entry with a relatively large page size, the TLB does not know if there are multiple translation entries stored in the TLB for smaller splintered pages of the relatively large page. The TLB tracks whether or not splintered pages for each translation context have been installed. If a TLB invalidate (TLBI) request is received, and splintered pages have not been installed, no searches are needed for splintered pages. To refresh the sticky bits, whenever a full TLB search is performed, the TLB rescans for splintered pages for other translation contexts. If no splintered pages are found, the sticky bit can be cleared and the number of full TLBI searches is reduced.
    Type: Grant
    Filed: September 9, 2020
    Date of Patent: March 28, 2023
    Assignee: Apple Inc.
    Inventors: John D. Pape, Brian R. Mestan, Peter G. Soderquist
  • Publication number: 20230092898
    Abstract: A prefetcher for a coprocessor is disclosed. An apparatus includes a processor and a coprocessor that are configured to execute processor and coprocessor instructions, respectively. The processor and coprocessor instructions appear together in code sequences fetched by the processor, with the coprocessor instructions being provided to the coprocessor by the processor. The apparatus further includes a coprocessor prefetcher configured to monitor a code sequence fetched by the processor and, in response to identifying a presence of coprocessor instructions in the code sequence, capture the memory addresses, generated by the processor, of operand data for coprocessor instructions. The coprocessor is further configured to issue, for a cache memory accessible to the coprocessor, prefetches for data associated with the memory addresses prior to execution of the coprocessor instructions by the coprocessor.
    Type: Application
    Filed: December 10, 2021
    Publication date: March 23, 2023
    Inventors: Brandon H. Dwiel, Andrew J. Beaumont-Smith, Eric J. Furbish, John D. Pape, Stephen G. Meier, Tyler J. Huberty
  • Publication number: 20230017473
    Abstract: An apparatus includes multiple processors including respective cache memories, the cache memories configured to cache cache-entries for use by the processors. At least a processor among the processors includes cache management logic that is configured to (i) receive, from one or more of the other processors, cache-invalidation commands that request invalidation of specified cache-entries in the cache memory of the processor (ii) mark the specified cache-entries as intended for invalidation but defer actual invalidation of the specified cache-entries, and (iii) upon detecting a synchronization event associated with the cache-invalidation commands, invalidate the cache-entries that were marked as intended for invalidation.
    Type: Application
    Filed: July 13, 2021
    Publication date: January 19, 2023
    Inventors: John D. Pape, Mahesh K. Reddy, Prasanna Utchani Varadharajan, Pruthivi Vuyyuru
  • Patent number: 11422946
    Abstract: Systems, apparatuses, and methods for implementing translation lookaside buffer (TLB) striping to enable efficient invalidation operations are described. TLB sizes are growing in width (more features in a given page table entry) and depth (to cover larger memory footprints). A striping scheme is proposed to enable an efficient and high performance method for performing TLB maintenance operations in the face of this growth. Accordingly, a TLB stores first attribute data in a striped manner across a plurality of arrays. The striped manner allows different entries to be searched simultaneously in response to receiving an invalidation request which identifies a particular attribute of a group to be invalidated. Upon receiving an invalidation request, the TLB generates a plurality of indices with an offset between each index and walks through the plurality of arrays by incrementing each index and simultaneously checking the first attribute data in corresponding entries.
    Type: Grant
    Filed: August 31, 2020
    Date of Patent: August 23, 2022
    Assignee: Apple Inc.
    Inventors: John D. Pape, Brian R. Mestan, Peter G. Soderquist
  • Publication number: 20220075734
    Abstract: Systems, apparatuses, and methods for performing efficient translation lookaside buffer (TLB) invalidation operations for splintered pages are described. When a TLB receives an invalidation request for a specified translation context, and the invalidation request maps to an entry with a relatively large page size, the TLB does not know if there are multiple translation entries stored in the TLB for smaller splintered pages of the relatively large page. The TLB tracks whether or not splintered pages for each translation context have been installed. If a TLB invalidate (TLBI) request is received, and splintered pages have not been installed, no searches are needed for splintered pages. To refresh the sticky bits, whenever a full TLB search is performed, the TLB rescans for splintered pages for other translation contexts. If no splintered pages are found, the sticky bit can be cleared and the number of full TLBI searches is reduced.
    Type: Application
    Filed: September 9, 2020
    Publication date: March 10, 2022
    Inventors: John D. Pape, Brian R. Mestan, Peter G. Soderquist
  • Publication number: 20220075735
    Abstract: Systems, apparatuses, and methods for limiting translation lookaside buffer (TLB) searches using active page size are described. A TLB stores virtual-to-physical address translations for a plurality of different page sizes. When the TLB receives a command to invalidate a TLB entry corresponding to a specified virtual address, the TLB performs, for the plurality of different pages sizes, multiple different lookups of the indices corresponding to the specified virtual address. In order to reduce the number of lookups that are performed, the TLB relies on a page size presence vector and an age matrix to determine which page sizes to search for and in which order. The page size presence vector indicates which page sizes may be stored for the specified virtual address. The age matrix stores a preferred search order with the most probable page size first and the least probable page size last.
    Type: Application
    Filed: September 9, 2020
    Publication date: March 10, 2022
    Inventors: John D. Pape, Brian R. Mestan, Peter G. Soderquist
  • Publication number: 20220066947
    Abstract: Systems, apparatuses, and methods for implementing translation lookaside buffer (TLB) striping to enable efficient invalidation operations are described. TLB sizes are growing in width (more features in a given page table entry) and depth (to cover larger memory footprints). A striping scheme is proposed to enable an efficient and high performance method for performing TLB maintenance operations in the face of this growth. Accordingly, a TLB stores first attribute data in a striped manner across a plurality of arrays. The striped manner allows different entries to be searched simultaneously in response to receiving an invalidation request which identifies a particular attribute of a group to be invalidated. Upon receiving an invalidation request, the TLB generates a plurality of indices with an offset between each index and walks through the plurality of arrays by incrementing each index and simultaneously checking the first attribute data in corresponding entries.
    Type: Application
    Filed: August 31, 2020
    Publication date: March 3, 2022
    Inventors: John D. Pape, Brian R. Mestan, Peter G. Soderquist
  • Patent number: 10289331
    Abstract: Systems and methods for use in enhancing and dynamically allocating random data bandwidth among requesting cores in multi-core processors to reduce system latencies and increase system performance. In one arrangement, a multicore processor includes a vertical pre-fetch random data buffer structure that stores random data being continuously generated by a random data generator (RNG) so that such random data is ready for consumption upon request from one or more of a plurality of processing cores of the multicore processor. Random data received at one data buffer from a higher level buffer may be automatically deposited into the lower level buffer if room exists in the lower level buffer. Requesting strands of a core may fetch random data directly from its corresponding first level pre-fetch buffer on demand rather than having to trigger a PIO access or the like to fetch random data from the RNG.
    Type: Grant
    Filed: September 26, 2018
    Date of Patent: May 14, 2019
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventors: Bruce J. Chang, Fred Tsai, John D. Pape
  • Publication number: 20190026040
    Abstract: Systems and methods for use in enhancing and dynamically allocating random data bandwidth among requesting cores in multi-core processors to reduce system latencies and increase system performance. In one arrangement, a multicore processor includes a vertical pre-fetch random data buffer structure that stores random data being continuously generated by a random data generator (RNG) so that such random data is ready for consumption upon request from one or more of a plurality of processing cores of the multicore processor. Random data received at one data buffer from a higher level buffer may be automatically deposited into the lower level buffer if room exists in the lower level buffer. Requesting strands of a core may fetch random data directly from its corresponding first level pre-fetch buffer on demand rather than having to trigger a PIO access or the like to fetch random data from the RNG.
    Type: Application
    Filed: September 26, 2018
    Publication date: January 24, 2019
    Inventors: Bruce J. Chang, Fred Tsai, John D. Pape
  • Patent number: 10114572
    Abstract: Systems and methods for use in enhancing and dynamically allocating random data bandwidth among requesting cores in multi-core processors to reduce system latencies and increase system performance. In one arrangement, a multicore processor includes a vertical pre-fetch random data buffer structure that stores random data being continuously generated by a random data generator (RNG) so that such random data is ready for consumption upon request from one or more of a plurality of processing cores of the multicore processor. Random data received at one data buffer from a higher level buffer may be automatically deposited into the lower level buffer if room exists in the lower level buffer. Requesting strands of a core may fetch random data directly from its corresponding first level pre-fetch buffer on demand rather than having to trigger a PIO access or the like to fetch random data from the RNG.
    Type: Grant
    Filed: December 6, 2016
    Date of Patent: October 30, 2018
    Assignee: Oracle International Corporation
    Inventors: Bruce J. Chang, Fred Tsai, John D. Pape
  • Publication number: 20180157435
    Abstract: Systems and methods for use in enhancing and dynamically allocating random data bandwidth among requesting cores in multi-core processors to reduce system latencies and increase system performance. In one arrangement, a multicore processor includes a vertical pre-fetch random data buffer structure that stores random data being continuously generated by a random data generator (RNG) so that such random data is ready for consumption upon request from one or more of a plurality of processing cores of the multicore processor. Random data received at one data buffer from a higher level buffer may be automatically deposited into the lower level buffer if room exists in the lower level buffer. Requesting strands of a core may fetch random data directly from its corresponding first level pre-fetch buffer on demand rather than having to trigger a PIO access or the like to fetch random data from the RNG.
    Type: Application
    Filed: December 6, 2016
    Publication date: June 7, 2018
    Inventors: Bruce J. Chang, Fred Tsai, John D. Pape
  • Patent number: 8949545
    Abstract: A data processing device includes a load/store module to provide an interface between a processor device and a bus. In response to receiving a load or store instruction from the processor device, the load/store module determines a predicted coherency state of a cache line associated with the load or store instruction. Based on the predicted coherency state, the load/store module selects a bus transaction and communicates it to the bus. By selecting the bus transaction based on the predicted cache state, the load/store module does not have to wait for all pending bus transactions to be serviced, providing for greater predictability as to when bus transactions will be communicated to the bus, and allowing the bus behavior to be more easily simulated.
    Type: Grant
    Filed: December 4, 2008
    Date of Patent: February 3, 2015
    Assignee: Freescale Semiconductor, Inc.
    Inventor: John D. Pape
  • Publication number: 20100146217
    Abstract: A data processing device includes a load/store module to provide an interface between a processor device and a bus. In response to receiving a load or store instruction from the processor device, the load/store module determines a predicted coherency state of a cache line associated with the load or store instruction. Based on the predicted coherency state, the load/store module selects a bus transaction and communicates it to the bus. By selecting the bus transaction based on the predicted cache state, the load/store module does not have to wait for all pending bus transactions to be serviced, providing for greater predictability as to when bus transactions will be communicated to the bus, and allowing the bus behavior to be more easily simulated.
    Type: Application
    Filed: December 4, 2008
    Publication date: June 10, 2010
    Applicant: FREESCALE SEMICONDUCTOR, INC.
    Inventor: John D. Pape
  • Patent number: 7516205
    Abstract: A system comprises a plurality of nodes coupled together wherein each node has access to associated memory. Further, each node is adapted to transmit a memory request to at least one other node while concurrently decoding the memory request to determine which node contains the memory targeted by the memory request.
    Type: Grant
    Filed: August 16, 2004
    Date of Patent: April 7, 2009
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: John D. Pape