Patents by Inventor John D. Pape
John D. Pape has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240095037Abstract: A prefetcher for a coprocessor is disclosed. An apparatus includes a processor and a coprocessor that are configured to execute processor and coprocessor instructions, respectively. The processor and coprocessor instructions appear together in code sequences fetched by the processor, with the coprocessor instructions being provided to the coprocessor by the processor. The apparatus further includes a coprocessor prefetcher configured to monitor a code sequence fetched by the processor and, in response to identifying a presence of coprocessor instructions in the code sequence, capture the memory addresses, generated by the processor, of operand data for coprocessor instructions. The coprocessor is further configured to issue, for a cache memory accessible to the coprocessor, prefetches for data associated with the memory addresses prior to execution of the coprocessor instructions by the coprocessor.Type: ApplicationFiled: July 28, 2023Publication date: March 21, 2024Inventors: Brandon H. Dwiel, Andrew J. Beaumont-Smith, Eric J. Furbish, John D. Pape, Stephen G. Meier, Tyler J. Huberty
-
Patent number: 11900118Abstract: An apparatus includes a rescue buffer circuit, a store queue circuit, and a control circuit. The rescue buffer circuit may be configured to retain address information related to store instructions. The store queue circuit may be configured to buffer dependency information related to a particular store instruction until the particular store instruction is released to be executed. The control circuit may be configured to cause a subset of the dependency information for the particular store instruction to be written to the rescue buffer circuit. The rescue buffer circuit may be configured to retain the subset after the dependency information has been released from the store queue circuit, and to perform a subsequent load instruction corresponding to a memory location associated with the particular store instruction using the subset of the dependency information from the rescue buffer circuit.Type: GrantFiled: August 5, 2022Date of Patent: February 13, 2024Assignee: Apple Inc.Inventors: John D. Pape, Francesco Spadini, Zhaoxiang Jin
-
Patent number: 11755333Abstract: A prefetcher for a coprocessor is disclosed. An apparatus includes a processor and a coprocessor that are configured to execute processor and coprocessor instructions, respectively. The processor and coprocessor instructions appear together in code sequences fetched by the processor, with the coprocessor instructions being provided to the coprocessor by the processor. The apparatus further includes a coprocessor prefetcher configured to monitor a code sequence fetched by the processor and, in response to identifying a presence of coprocessor instructions in the code sequence, capture the memory addresses, generated by the processor, of operand data for coprocessor instructions. The coprocessor is further configured to issue, for a cache memory accessible to the coprocessor, prefetches for data associated with the memory addresses prior to execution of the coprocessor instructions by the coprocessor.Type: GrantFiled: December 10, 2021Date of Patent: September 12, 2023Assignee: Apple Inc.Inventors: Brandon H. Dwiel, Andrew J. Beaumont-Smith, Eric J. Furbish, John D. Pape, Stephen G. Meier, Tyler J. Huberty
-
Publication number: 20230236988Abstract: Systems, apparatuses, and methods for performing efficient translation lookaside buffer (TLB) invalidation operations for splintered pages are described. When a TLB receives an invalidation request for a specified translation context, and the invalidation request maps to an entry with a relatively large page size, the TLB does not know if there are multiple translation entries stored in the TLB for smaller splintered pages of the relatively large page. The TLB tracks whether or not splintered pages for each translation context have been installed. If a TLB invalidate (TLBI) request is received, and splintered pages have not been installed, no searches are needed for splintered pages. To refresh the sticky bits, whenever a full TLB search is performed, the TLB rescans for splintered pages for other translation contexts. If no splintered pages are found, the sticky bit can be cleared and the number of full TLBI searches is reduced.Type: ApplicationFiled: March 24, 2023Publication date: July 27, 2023Inventors: John D. Pape, Brian R. Mestan, Peter G. Soderquist
-
Patent number: 11675710Abstract: Systems, apparatuses, and methods for limiting translation lookaside buffer (TLB) searches using active page size are described. A TLB stores virtual-to-physical address translations for a plurality of different page sizes. When the TLB receives a command to invalidate a TLB entry corresponding to a specified virtual address, the TLB performs, for the plurality of different pages sizes, multiple different lookups of the indices corresponding to the specified virtual address. In order to reduce the number of lookups that are performed, the TLB relies on a page size presence vector and an age matrix to determine which page sizes to search for and in which order. The page size presence vector indicates which page sizes may be stored for the specified virtual address. The age matrix stores a preferred search order with the most probable page size first and the least probable page size last.Type: GrantFiled: September 9, 2020Date of Patent: June 13, 2023Assignee: Apple Inc.Inventors: John D. Pape, Brian R. Mestan, Peter G. Soderquist
-
Patent number: 11630771Abstract: An apparatus includes multiple processors including respective cache memories, the cache memories configured to cache cache-entries for use by the processors. At least a processor among the processors includes cache management logic that is configured to (i) receive, from one or more of the other processors, cache-invalidation commands that request invalidation of specified cache-entries in the cache memory of the processor (ii) mark the specified cache-entries as intended for invalidation but defer actual invalidation of the specified cache-entries, and (iii) upon detecting a synchronization event associated with the cache-invalidation commands, invalidate the cache-entries that were marked as intended for invalidation.Type: GrantFiled: July 13, 2021Date of Patent: April 18, 2023Assignee: APPLE INC.Inventors: John D Pape, Mahesh K Reddy, Prasanna Utchani Varadharajan, Pruthivi Vuyyuru
-
Patent number: 11615033Abstract: Systems, apparatuses, and methods for performing efficient translation lookaside buffer (TLB) invalidation operations for splintered pages are described. When a TLB receives an invalidation request for a specified translation context, and the invalidation request maps to an entry with a relatively large page size, the TLB does not know if there are multiple translation entries stored in the TLB for smaller splintered pages of the relatively large page. The TLB tracks whether or not splintered pages for each translation context have been installed. If a TLB invalidate (TLBI) request is received, and splintered pages have not been installed, no searches are needed for splintered pages. To refresh the sticky bits, whenever a full TLB search is performed, the TLB rescans for splintered pages for other translation contexts. If no splintered pages are found, the sticky bit can be cleared and the number of full TLBI searches is reduced.Type: GrantFiled: September 9, 2020Date of Patent: March 28, 2023Assignee: Apple Inc.Inventors: John D. Pape, Brian R. Mestan, Peter G. Soderquist
-
Publication number: 20230092898Abstract: A prefetcher for a coprocessor is disclosed. An apparatus includes a processor and a coprocessor that are configured to execute processor and coprocessor instructions, respectively. The processor and coprocessor instructions appear together in code sequences fetched by the processor, with the coprocessor instructions being provided to the coprocessor by the processor. The apparatus further includes a coprocessor prefetcher configured to monitor a code sequence fetched by the processor and, in response to identifying a presence of coprocessor instructions in the code sequence, capture the memory addresses, generated by the processor, of operand data for coprocessor instructions. The coprocessor is further configured to issue, for a cache memory accessible to the coprocessor, prefetches for data associated with the memory addresses prior to execution of the coprocessor instructions by the coprocessor.Type: ApplicationFiled: December 10, 2021Publication date: March 23, 2023Inventors: Brandon H. Dwiel, Andrew J. Beaumont-Smith, Eric J. Furbish, John D. Pape, Stephen G. Meier, Tyler J. Huberty
-
Publication number: 20230017473Abstract: An apparatus includes multiple processors including respective cache memories, the cache memories configured to cache cache-entries for use by the processors. At least a processor among the processors includes cache management logic that is configured to (i) receive, from one or more of the other processors, cache-invalidation commands that request invalidation of specified cache-entries in the cache memory of the processor (ii) mark the specified cache-entries as intended for invalidation but defer actual invalidation of the specified cache-entries, and (iii) upon detecting a synchronization event associated with the cache-invalidation commands, invalidate the cache-entries that were marked as intended for invalidation.Type: ApplicationFiled: July 13, 2021Publication date: January 19, 2023Inventors: John D. Pape, Mahesh K. Reddy, Prasanna Utchani Varadharajan, Pruthivi Vuyyuru
-
Patent number: 11422946Abstract: Systems, apparatuses, and methods for implementing translation lookaside buffer (TLB) striping to enable efficient invalidation operations are described. TLB sizes are growing in width (more features in a given page table entry) and depth (to cover larger memory footprints). A striping scheme is proposed to enable an efficient and high performance method for performing TLB maintenance operations in the face of this growth. Accordingly, a TLB stores first attribute data in a striped manner across a plurality of arrays. The striped manner allows different entries to be searched simultaneously in response to receiving an invalidation request which identifies a particular attribute of a group to be invalidated. Upon receiving an invalidation request, the TLB generates a plurality of indices with an offset between each index and walks through the plurality of arrays by incrementing each index and simultaneously checking the first attribute data in corresponding entries.Type: GrantFiled: August 31, 2020Date of Patent: August 23, 2022Assignee: Apple Inc.Inventors: John D. Pape, Brian R. Mestan, Peter G. Soderquist
-
Publication number: 20220075734Abstract: Systems, apparatuses, and methods for performing efficient translation lookaside buffer (TLB) invalidation operations for splintered pages are described. When a TLB receives an invalidation request for a specified translation context, and the invalidation request maps to an entry with a relatively large page size, the TLB does not know if there are multiple translation entries stored in the TLB for smaller splintered pages of the relatively large page. The TLB tracks whether or not splintered pages for each translation context have been installed. If a TLB invalidate (TLBI) request is received, and splintered pages have not been installed, no searches are needed for splintered pages. To refresh the sticky bits, whenever a full TLB search is performed, the TLB rescans for splintered pages for other translation contexts. If no splintered pages are found, the sticky bit can be cleared and the number of full TLBI searches is reduced.Type: ApplicationFiled: September 9, 2020Publication date: March 10, 2022Inventors: John D. Pape, Brian R. Mestan, Peter G. Soderquist
-
Publication number: 20220075735Abstract: Systems, apparatuses, and methods for limiting translation lookaside buffer (TLB) searches using active page size are described. A TLB stores virtual-to-physical address translations for a plurality of different page sizes. When the TLB receives a command to invalidate a TLB entry corresponding to a specified virtual address, the TLB performs, for the plurality of different pages sizes, multiple different lookups of the indices corresponding to the specified virtual address. In order to reduce the number of lookups that are performed, the TLB relies on a page size presence vector and an age matrix to determine which page sizes to search for and in which order. The page size presence vector indicates which page sizes may be stored for the specified virtual address. The age matrix stores a preferred search order with the most probable page size first and the least probable page size last.Type: ApplicationFiled: September 9, 2020Publication date: March 10, 2022Inventors: John D. Pape, Brian R. Mestan, Peter G. Soderquist
-
Publication number: 20220066947Abstract: Systems, apparatuses, and methods for implementing translation lookaside buffer (TLB) striping to enable efficient invalidation operations are described. TLB sizes are growing in width (more features in a given page table entry) and depth (to cover larger memory footprints). A striping scheme is proposed to enable an efficient and high performance method for performing TLB maintenance operations in the face of this growth. Accordingly, a TLB stores first attribute data in a striped manner across a plurality of arrays. The striped manner allows different entries to be searched simultaneously in response to receiving an invalidation request which identifies a particular attribute of a group to be invalidated. Upon receiving an invalidation request, the TLB generates a plurality of indices with an offset between each index and walks through the plurality of arrays by incrementing each index and simultaneously checking the first attribute data in corresponding entries.Type: ApplicationFiled: August 31, 2020Publication date: March 3, 2022Inventors: John D. Pape, Brian R. Mestan, Peter G. Soderquist
-
Patent number: 10289331Abstract: Systems and methods for use in enhancing and dynamically allocating random data bandwidth among requesting cores in multi-core processors to reduce system latencies and increase system performance. In one arrangement, a multicore processor includes a vertical pre-fetch random data buffer structure that stores random data being continuously generated by a random data generator (RNG) so that such random data is ready for consumption upon request from one or more of a plurality of processing cores of the multicore processor. Random data received at one data buffer from a higher level buffer may be automatically deposited into the lower level buffer if room exists in the lower level buffer. Requesting strands of a core may fetch random data directly from its corresponding first level pre-fetch buffer on demand rather than having to trigger a PIO access or the like to fetch random data from the RNG.Type: GrantFiled: September 26, 2018Date of Patent: May 14, 2019Assignee: ORACLE INTERNATIONAL CORPORATIONInventors: Bruce J. Chang, Fred Tsai, John D. Pape
-
Publication number: 20190026040Abstract: Systems and methods for use in enhancing and dynamically allocating random data bandwidth among requesting cores in multi-core processors to reduce system latencies and increase system performance. In one arrangement, a multicore processor includes a vertical pre-fetch random data buffer structure that stores random data being continuously generated by a random data generator (RNG) so that such random data is ready for consumption upon request from one or more of a plurality of processing cores of the multicore processor. Random data received at one data buffer from a higher level buffer may be automatically deposited into the lower level buffer if room exists in the lower level buffer. Requesting strands of a core may fetch random data directly from its corresponding first level pre-fetch buffer on demand rather than having to trigger a PIO access or the like to fetch random data from the RNG.Type: ApplicationFiled: September 26, 2018Publication date: January 24, 2019Inventors: Bruce J. Chang, Fred Tsai, John D. Pape
-
Patent number: 10114572Abstract: Systems and methods for use in enhancing and dynamically allocating random data bandwidth among requesting cores in multi-core processors to reduce system latencies and increase system performance. In one arrangement, a multicore processor includes a vertical pre-fetch random data buffer structure that stores random data being continuously generated by a random data generator (RNG) so that such random data is ready for consumption upon request from one or more of a plurality of processing cores of the multicore processor. Random data received at one data buffer from a higher level buffer may be automatically deposited into the lower level buffer if room exists in the lower level buffer. Requesting strands of a core may fetch random data directly from its corresponding first level pre-fetch buffer on demand rather than having to trigger a PIO access or the like to fetch random data from the RNG.Type: GrantFiled: December 6, 2016Date of Patent: October 30, 2018Assignee: Oracle International CorporationInventors: Bruce J. Chang, Fred Tsai, John D. Pape
-
Publication number: 20180157435Abstract: Systems and methods for use in enhancing and dynamically allocating random data bandwidth among requesting cores in multi-core processors to reduce system latencies and increase system performance. In one arrangement, a multicore processor includes a vertical pre-fetch random data buffer structure that stores random data being continuously generated by a random data generator (RNG) so that such random data is ready for consumption upon request from one or more of a plurality of processing cores of the multicore processor. Random data received at one data buffer from a higher level buffer may be automatically deposited into the lower level buffer if room exists in the lower level buffer. Requesting strands of a core may fetch random data directly from its corresponding first level pre-fetch buffer on demand rather than having to trigger a PIO access or the like to fetch random data from the RNG.Type: ApplicationFiled: December 6, 2016Publication date: June 7, 2018Inventors: Bruce J. Chang, Fred Tsai, John D. Pape
-
Patent number: 8949545Abstract: A data processing device includes a load/store module to provide an interface between a processor device and a bus. In response to receiving a load or store instruction from the processor device, the load/store module determines a predicted coherency state of a cache line associated with the load or store instruction. Based on the predicted coherency state, the load/store module selects a bus transaction and communicates it to the bus. By selecting the bus transaction based on the predicted cache state, the load/store module does not have to wait for all pending bus transactions to be serviced, providing for greater predictability as to when bus transactions will be communicated to the bus, and allowing the bus behavior to be more easily simulated.Type: GrantFiled: December 4, 2008Date of Patent: February 3, 2015Assignee: Freescale Semiconductor, Inc.Inventor: John D. Pape
-
Publication number: 20100146217Abstract: A data processing device includes a load/store module to provide an interface between a processor device and a bus. In response to receiving a load or store instruction from the processor device, the load/store module determines a predicted coherency state of a cache line associated with the load or store instruction. Based on the predicted coherency state, the load/store module selects a bus transaction and communicates it to the bus. By selecting the bus transaction based on the predicted cache state, the load/store module does not have to wait for all pending bus transactions to be serviced, providing for greater predictability as to when bus transactions will be communicated to the bus, and allowing the bus behavior to be more easily simulated.Type: ApplicationFiled: December 4, 2008Publication date: June 10, 2010Applicant: FREESCALE SEMICONDUCTOR, INC.Inventor: John D. Pape
-
Patent number: 7516205Abstract: A system comprises a plurality of nodes coupled together wherein each node has access to associated memory. Further, each node is adapted to transmit a memory request to at least one other node while concurrently decoding the memory request to determine which node contains the memory targeted by the memory request.Type: GrantFiled: August 16, 2004Date of Patent: April 7, 2009Assignee: Hewlett-Packard Development Company, L.P.Inventor: John D. Pape