Operand Prefetch, E.g., Prefetch Instruction, Address Prediction (epo) Patents (Class 712/E9.047)
  • Patent number: 12230354
    Abstract: The present disclosure includes apparatuses and methods related to scatter/gather in a memory device. An example apparatus comprises a memory device that includes an array of memory cells, sensing circuitry, and a memory controller coupled to one another. The sensing circuitry includes a sense amplifier and a compute component configured to implement logical operations. A channel controller is configured to receive a block of instructions, the block of instructions including individual instructions for at least one of a gather operation and a scatter operation. The channel controller is configured to send individual instructions to the memory device and to control the memory controller such that the at least one of the gather operation and the scatter operation is executed on the memory device based on a corresponding one of the individual instructions.
    Type: Grant
    Filed: October 21, 2022
    Date of Patent: February 18, 2025
    Inventors: Jason T. Zawodny, Kelley D. Dobelstein, Timothy P. Finkbeiner, Richard C. Murphy
  • Patent number: 12229054
    Abstract: Aspects presented herein relate to methods and devices for graphics processing units including an apparatus. The apparatus may calculate a first average memory latency for the first configuration of the cache. Further, the apparatus may adjust the first configuration of the cache to a second configuration of the cache. The apparatus may calculate a second average memory latency for second configuration of the cache. Further, the apparatus may adjust the second configuration to a third configuration of the cache. The apparatus may calculate a third average memory latency for third configuration of the cache. The apparatus may output an indication of a lowest average memory latency of the first average memory latency, the second average memory latency, or a third average memory latency. Also, the apparatus may set, based on the lowest average memory latency, the cache to the first configuration, the second configuration, or a third configuration.
    Type: Grant
    Filed: March 31, 2023
    Date of Patent: February 18, 2025
    Assignee: QUALCOMM Incorporated
    Inventor: Suryanarayana Murthy Durbhakula
  • Patent number: 12216610
    Abstract: A microprocessor system comprises a computational array and a hardware data formatter. The computational array includes a plurality of computation units that each operates on a corresponding value addressed from memory. The values operated by the computation units are synchronously provided together to the computational array as a group of values to be processed in parallel. The hardware data formatter is configured to gather the group of values, wherein the group of values includes a first subset of values located consecutively in memory and a second subset of values located consecutively in memory. The first subset of values is not required to be located consecutively in the memory from the second subset of values.
    Type: Grant
    Filed: June 15, 2023
    Date of Patent: February 4, 2025
    Assignee: Tesla, Inc.
    Inventors: Emil Talpes, William McGee, Peter Joseph Bannon
  • Patent number: 12197335
    Abstract: Prefetch circuitry may be configured to transmit a message to cancel a prefetch of one or more cache blocks of a group. The message may correspond to a prefetch message by indicating an address for the group and a bit field for the one or more cache blocks of the group to cancel. In some implementations, the message may target a higher level cache to cancel prefetching the one or more cache blocks, and the message may be transmitted to the higher level cache via a lower level cache. In some implementations, the message may target a higher level cache to cancel prefetching the one or more cache blocks, the message may be transmitted to a lower level cache via a first command bus, and the lower level cache may forward the message to the higher level cache via a second command bus.
    Type: Grant
    Filed: March 13, 2023
    Date of Patent: January 14, 2025
    Assignee: SiFive, Inc.
    Inventors: Eric Andrew Gouldey, Wesley Waylon Terpstra, Michael Klinglesmith
  • Patent number: 12164429
    Abstract: Stride-based prefetcher circuits for prefetching data for next stride(s) of a cache read request into a cache memory based on identified stride patterns in the cache read request, and related processor-based systems and methods are disclosed. A stride-based prefetcher circuit (“prefetcher circuit”) observes cache read requests to a cache memory in run-time to determine if a stride pattern exists. In response to detecting a stride pattern, the prefetcher circuit prefetches data from next memory location(s) in the detected stride from higher-level memory, and loads the prefetch data into the cache memory. This is because there is a higher likelihood that when a stride in the cache read requests to the cache memory is detected to exist, subsequent cache read requests to the cache memory will more likely than not continue with the same stride. The cache hit rate of the cache memory may be increased as a result.
    Type: Grant
    Filed: August 19, 2022
    Date of Patent: December 10, 2024
    Assignee: QUALCOMM Incorporated
    Inventor: Suryanarayana Murthy Durbhakula
  • Patent number: 12164924
    Abstract: A method includes, in response to receiving an instruction to perform a first operation on first data stored in a memory device, obtaining first compression metadata from the memory device based on an address for the first data, and reducing a number of operations in a set of operations based on the first operation and one or more matching addresses, the one or more matching addresses corresponding to second compression metadata matching the first compression metadata.
    Type: Grant
    Filed: September 25, 2020
    Date of Patent: December 10, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Matthew Tomei, Shomit Das
  • Patent number: 12147810
    Abstract: A processor, an operation method, and a load-store device are provided. The processor is adapted to access a memory. The processor includes a vector register file (VRF) and the load-store device. The load-store device is coupled to the VRF. The load-store device performs a strided operation on the memory. In a current iteration of the strided operation, the load-store device reads a plurality of first data elements at a plurality of discrete addresses in the memory and writes the first data elements into the VRF, or the load-store device reads a plurality of second data elements from the VRF and writes the second data elements into a plurality of discrete addresses in the memory during the current iteration of the strided operation.
    Type: Grant
    Filed: June 22, 2022
    Date of Patent: November 19, 2024
    Assignee: ANDES TECHNOLOGY CORPORATION
    Inventor: Chia-Wei Hsu
  • Patent number: 12112173
    Abstract: An embodiment of an integrated circuit may comprise a branch predictor to predict whether a conditional branch is taken for one or more instructions, the branch predictor including circuitry to identify a loop branch instruction in the one or more instructions, and provide a branch prediction for the loop branch instruction based on a context of the loop branch instruction. Other embodiments are disclosed and claimed.
    Type: Grant
    Filed: December 21, 2020
    Date of Patent: October 8, 2024
    Assignee: Intel Corporation
    Inventors: Ke Sun, Rodrigo Branco, Kekai Hu
  • Patent number: 12106109
    Abstract: The present disclosure relates to a data processing apparatus and related products. The data processing apparatus includes a decoding unit, a discrete-address determining unit, a continuous-data caching unit, a data read/write unit, and a storage unit. Through the data processing apparatus, the processing instruction may be decoded and executed. Discrete data may be transferred to a continuous data address, or continuous data may be stored to multiple discrete data addresses. As such, a vector computation of discrete data and vector data restoration after the vector computation may be implemented, which may simplify a processing process, thereby reducing data overhead.
    Type: Grant
    Filed: April 28, 2021
    Date of Patent: October 1, 2024
    Assignee: ANHUI CAMBRICON INFORMATION TECHNOLOGY CO., LTD.
    Inventors: Xuyan Ma, Jianhua Wu, Shaoli Liu, Xiangxuan Ge, Hanbo Liu, Lei Zhang
  • Patent number: 12093800
    Abstract: A device includes one or more processors configured to retrieve a first block of data, the data corresponding to array of values arranged along at least a first dimension and a second dimension, to retrieve at least a portion of a second block of the data, and to perform a first hybrid convolution operation that applies a filter across the first block and at least the portion of the second block to generate output data. The output data includes a first accumulated block and at least a portion of a second accumulated block. The one or more processors are also configured to store the first accumulated block as first output data. The portion of the second block is adjacent to the first block along the first dimension and the portion of the second accumulated block is adjacent to the first accumulated block along the second dimension.
    Type: Grant
    Filed: February 2, 2021
    Date of Patent: September 17, 2024
    Assignee: QUALCOMM Incorporated
    Inventor: Eric Wayne Mahurin
  • Patent number: 12056049
    Abstract: An out-of-order buffer includes an out-of-order queue and a controlling circuit. The out-of-order queue includes a request sequence table and a request storage device. The controlling circuit receives and temporarily stores the plural requests into the out-of-order queue. After the plural requests are transmitted to plural corresponding target devices, the controlling circuit retires the plural requests. The request sequence table contains m×n indicating units. The request sequence table contains m entry indicating rows. Each of the m entry indicating rows contains n indicating units. The request storage device includes m storage units corresponding to the m entry indicating rows in the request sequence table. The state of indicating whether one request is stored in the corresponding storage unit of the m storage units is recoded in the request sequence table. The storage sequence of the plural requests is recoded in the request sequence table.
    Type: Grant
    Filed: November 18, 2022
    Date of Patent: August 6, 2024
    Assignee: RDC SEMICONDUCTOR CO., LTD.
    Inventors: Jyun-Yan Li, Po-Hsiang Huang, Ya-Ting Chen, Yao-An Tsai, Shu-Wei Yi
  • Patent number: 11989254
    Abstract: A method, system, apparatus and product for semantic meaning association to components of digital content. The method comprising obtaining a digital content, which comprises multiple visually separated components. The method comprises analyzing at least a portion of the digital content to extract features associated with a component and automatically determining, based on the extracted features, a semantic meaning of the component. The automatic determination is performed without relying on manually inputted hints in the digital content. The method further comprises automatically and without user intervention, performing an action associated with the digital content, wherein the action is determined based on the semantic meaning.
    Type: Grant
    Filed: September 10, 2021
    Date of Patent: May 21, 2024
    Assignee: Taboola.com Ltd.
    Inventors: Yotam Bar On, Yonatan Schvimer, Or Yaniv, Avi Yungelson
  • Patent number: 11972264
    Abstract: Processing circuitry performs processing operations in response to micro-operations. Front end circuitry supplies the micro-operations to be processed by the processing circuitry. Prediction circuitry generates a prediction of a number of loop iterations for which one or more micro-operations per loop iteration are to be supplied by the front end circuitry, where an actual number of loop iterations to be processed by the processing circuitry is resolvable by the processing circuitry based on at least one operand corresponding to a first loop iteration to be processed by the processing circuitry. The front end circuitry varies, based on a level of confidence in the prediction of the number of loop iterations, a supply rate with which the one or more micro-operations for at least a subset of the loop iterations are supplied to the processing circuitry.
    Type: Grant
    Filed: June 13, 2022
    Date of Patent: April 30, 2024
    Inventors: Guillaume Bolbenes, Thibaut Elie Lanois, Houdhaifa Bouzguarrou, Luca Nassi
  • Patent number: 11928524
    Abstract: The computer system includes one or more storage devices and a management computer, the management computer includes an information collection unit, an event detection unit, a plan generation unit, and a plan execution unit. The plan generation unit determines a target volume of a change process of a right of control in a plan, a processor of a change source of the right of control, and a processor of a change destination of the right of control, estimates an influence by a change process of the right of control in the plan, and the plan execution unit determines execution time of the plan based on the estimation of the influence and the operation information of the storage devices. As a result, in consideration of the influence by an ownership change process, while the influence applied to usage of a computer system is suppressed, the ownership change process is executed.
    Type: Grant
    Filed: February 26, 2021
    Date of Patent: March 12, 2024
    Assignee: Hitachi, Ltd.
    Inventors: Tsukasa Shibayama, Kazuei Hironaka, Kenta Sato
  • Patent number: 11861759
    Abstract: Embodiments are generally directed to memory prefetching in multiple GPU environment. An embodiment of an apparatus includes multiple processors including a host processor and multiple graphics processing units (GPUs) to process data, each of the GPUs including a prefetcher and a cache; and a memory for storage of data, the memory including a plurality of memory elements, wherein the prefetcher of each of the GPUs is to prefetch data from the memory to the cache of the GPU; and wherein the prefetcher of a GPU is prohibited from prefetching from a page that is not owned by the GPU or by the host processor.
    Type: Grant
    Filed: January 20, 2022
    Date of Patent: January 2, 2024
    Assignee: INTEL CORPORATION
    Inventors: Joydeep Ray, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Nicolas Galoppo von Borries, Varghese George, Altug Koker, Elmoustapha Ould-Ahmed-Vall, Mike Macpherson, Subramaniam Maiyuran
  • Patent number: 11861193
    Abstract: A system and method for updating a configuration of a host system so that the memory sub-system of the host system emulates performance characteristics of a target memory sub-system. An example system includes a memory sub-system; and a processor, operatively coupled with the memory sub-system, to perform operations comprising receiving a request to emulate a characteristic of a target memory sub-system, identifying a candidate configuration that generates a load on a memory sub-system of a host system to decrease a characteristics of the memory sub-system of the host system, and updating a configuration of the host system based at least on the candidate configuration, wherein the updated configuration changes the memory sub-system of the host system to emulate the characteristic of the target memory sub-system.
    Type: Grant
    Filed: January 13, 2023
    Date of Patent: January 2, 2024
    Assignee: Micron Technology, Inc.
    Inventors: Jacob Mulamootil Jacob, John M. Groves, Steven Moyer
  • Patent number: 11847460
    Abstract: Apparatuses and methods for handling load requests are disclosed. In response to a load request specifying a data item to retrieve from memory, a series of data items comprising the data item identified by the load request are retrieved. Load requests are buffered prior to the load requests being carried out. Coalescing circuitry determines for the load request and a set of one or more other load requests buffered in the pending load buffer circuitry whether an address proximity condition is true. The address proximity condition is true when all data items identified by the set of one or more other load requests are comprised within the series of data items. When the address proximity condition is true, the set of one or more other load requests are suppressed. Coalescing prediction circuitry generates a coalescing prediction for each load request based on previous handling of load requests by the coalescing circuitry.
    Type: Grant
    Filed: March 24, 2021
    Date of Patent: December 19, 2023
    Assignee: Arm Limited
    Inventors: Mbou Eyole, Michiel Willem Van Tol
  • Patent number: 11797307
    Abstract: In response to an instruction decoder decoding a range prefetch instruction specifying first and second address-range-specifying parameters and a stride parameter, prefetch circuitry controls, depending on the first and second address-range-specifying parameters and the stride parameter, prefetching of data from a plurality of specified ranges of addresses into the at least one cache. A start address and size of each specified range is dependent on the first and second address-range-specifying parameters. The stride parameter specifies an offset between start addresses of successive specified ranges. Use of the range prefetch instruction helps to improve programmability and improve the balance between prefetch coverage and circuit area of the prefetch circuitry.
    Type: Grant
    Filed: June 23, 2021
    Date of Patent: October 24, 2023
    Assignee: Arm Limited
    Inventors: Krishnendra Nathella, David Hennah Mansell, Alejandro Rico Carro, Andrew Mundy
  • Patent number: 11726917
    Abstract: A method includes recording a first set of consecutive memory access deltas, where each of the consecutive memory access deltas represents a difference between two memory addresses accessed by an application, updating values in a prefetch training table based on the first set of memory access deltas, and predicting one or more memory addresses for prefetching responsive to a second set of consecutive memory access deltas and based on values in the prefetch training table.
    Type: Grant
    Filed: July 13, 2020
    Date of Patent: August 15, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Susumu Mashimo, John Kalamatianos
  • Patent number: 11620133
    Abstract: Systems and methods for reusing load instructions by a processor without accessing a data cache include a load store execution unit (LSU) of the processor, the LSU being configured to determine if a prior execution of a first load instruction loaded data from a first cache line of the data cache and determine if a current execution of the second load instruction will load the data from the first cache line of the data cache. Further, the LSU also determines if a reuse of the data from the prior execution of the first load instruction for the current execution of the second load instruction will lead to functional errors. If there are no functional errors, the data from the prior execution of the first load instruction is reused for the current execution of the second load instruction, without accessing the data cache for the current execution of the second load instruction.
    Type: Grant
    Filed: March 28, 2019
    Date of Patent: April 4, 2023
    Assignee: Qualcomm Incorporated
    Inventor: Vignyan Reddy Kothinti Naresh
  • Patent number: 11593113
    Abstract: Unaligned atomic memory operations on a processor using a load-store instruction set architecture (ISA) that requires aligned accesses are performed by widening the memory access to an aligned address by the next larger power of two (e.g., 4-byte access is widened to 8 bytes, and 8-byte access is widened to 16 bytes). Data processing operations supported by the load-store ISA including shift, rotate, and bitfield manipulation are utilized to modify only the bytes in the original unaligned address so that the atomic memory operations are aligned to the widened access address. The aligned atomic memory operations using the widened accesses avoid the faulting exceptions associated with unaligned access for most 4-byte and 8-byte accesses. Exception handling is performed in cases in which memory access spans a 16-byte boundary.
    Type: Grant
    Filed: October 4, 2021
    Date of Patent: February 28, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Darek Mihocka, Arun Upadhyaya Kishan, Pedro Miguel Sequeira De Justo Teixeira
  • Patent number: 11573739
    Abstract: An information processing apparatus includes: a first memory; a second memory different in processing speed from the first memory; and a processor including: a memory controller that is coupled to the first memory and the second memory and that controls an access to the first memory and an access to the second memory; and a plurality of controllers that access to the first memory or the second memory. The processor is configured to suppress a writing frequency of data into the second memory by controlling one or more first controllers that access the second memory among the plurality of controllers in accordance with a result of monitoring a state of writing the data into the second memory.
    Type: Grant
    Filed: January 25, 2021
    Date of Patent: February 7, 2023
    Assignee: FUJITSU LIMITED
    Inventor: Satoshi Imamura
  • Patent number: 11449429
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register specifies a circular address mode for the loop, first and second block size numbers and a circular address block size selection. For a first circular address block size selection the block size corresponds to the first block size number. For a first circular address block size selection the block size corresponds to the first block size number. For a second circular address block size selection the block size corresponds to a sum of the first block size number and the second block size number.
    Type: Grant
    Filed: April 20, 2021
    Date of Patent: September 20, 2022
    Assignee: Texas Instruments Incorporated
    Inventor: Joseph Zbiciak
  • Patent number: 11403082
    Abstract: Systems and methods are configured to receive code containing an original loop that includes irregular memory accesses. The original loop can be split. A pre-execution loop that contains code to prefetch content of the memory can be generated. Execution of the pre-execution loop can access memory inclusively between a starting location and the starting location plus a prefetch distance. A modified loop that can perform at least one computation based on the content prefetched with execution of the pre-execution loop can be generated. Execution of the main loop can to follow the execution of the pre-execution loop. The original loop can be replaced with the pre-execution loop and the modified loop.
    Type: Grant
    Filed: April 30, 2021
    Date of Patent: August 2, 2022
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Sanyam Mehta, Gary William Elsesser, Terry D. Greyzck
  • Patent number: 9811341
    Abstract: Disclosed is an apparatus and method to manage instruction cache prefetching from an instruction cache. A processor may comprise: a prefetch engine; a branch prediction engine to predict the outcome of a branch; and dynamic optimizer. The dynamic optimizer may be used to control: identifying common instruction cache misses and inserting a prefetch instruction from the prefetch engine to the instruction cache.
    Type: Grant
    Filed: December 29, 2011
    Date of Patent: November 7, 2017
    Assignee: Intel Corporation
    Inventors: Kyriakos A. Stavrou, Enric Gibert Codina, Josep M. Codina, Crispin Gomez Requena, Antonio Gonzalez, Mirem Hyuseinova, Christos E. Kotselidis, Fernando Latorre, Pedro Lopez, Marc Lupon, Carlos Madriles Gimeno, Grigorios Magklis, Pedro Marcuello, Alejandro Martinez Vicente, Raul Martinez, Daniel Ortega, Demos Pavlou, Georgios Tournavitis, Polychronis Xekalakis
  • Patent number: 8918626
    Abstract: The disclosed embodiments relate to a system that executes program instructions on a processor. During a normal-execution mode, the system issues instructions for execution in program order. Upon encountering an unresolved data dependency during execution of an instruction, the system speculatively executes subsequent instructions in a lookahead mode to prefetch future loads. When an instruction retires during the lookahead mode, a working register which serves as a destination register for the instruction is not copied to a corresponding architectural register. Instead the architectural register is marked as invalid. Note that by not updating architectural registers during lookahead mode, the system eliminates the need to checkpoint the architectural registers prior to entering lookahead mode.
    Type: Grant
    Filed: November 10, 2011
    Date of Patent: December 23, 2014
    Assignee: Oracle International Corporation
    Inventors: Yuan C. Chou, Eric W. Mahurin
  • Patent number: 8195888
    Abstract: Technologies are generally described for allocating available prefetch bandwidth among processor cores in a multiprocessor computing system. The prefetch bandwidth associated with an off-chip memory interface of the multiprocessor may be determined, partitioned, and allocated across multiple processor cores.
    Type: Grant
    Filed: March 20, 2009
    Date of Patent: June 5, 2012
    Assignee: Empire Technology Development LLC
    Inventor: Yan Solihin
  • Patent number: 8156286
    Abstract: A microprocessor includes a cache memory, a prefetch unit, and detection logic. The prefetch unit may be configured to monitor memory accesses that miss in the cache and to determine whether to prefetch one or more blocks of memory from a system memory based upon previous memory accesses. The prefetch unit may be further configured to use addresses of the memory accesses that miss to calculate each next memory block to prefetch. The detection logic may be configured to provide a notification to the prefetch unit in response to detecting a memory access instruction including a particular hint. In response to receiving the notification, the prefetch unit may be configured to inhibit using an address associated with the memory access instruction including the particular hint, when calculating subsequent memory blocks to prefetch.
    Type: Grant
    Filed: December 30, 2008
    Date of Patent: April 10, 2012
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Thomas M. Deneau
  • Patent number: 8032713
    Abstract: A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design is provided. The design structure generally includes a computer system that includes a CPU, a storage device, circuitry for providing a speculative access threshold corresponding to a selected percentage of the total number of accesses to the storage device that can be speculatively issued, and circuitry for intermixing demand accesses and speculative accesses in accordance with the speculative access threshold.
    Type: Grant
    Filed: May 5, 2008
    Date of Patent: October 4, 2011
    Assignee: International Business Machines Corporation
    Inventors: James J. Allen, Jr., Steven K. Jenkins, James A. Mossman, Michael R. Trombley
  • Patent number: 7949830
    Abstract: A system and method for handling speculative read requests for a memory controller in a computer system are provided. In one example, a method includes the steps of providing a speculative read threshold corresponding to a selected percentage of the total number of reads that can be speculatively issued, and intermixing demand reads and speculative reads in accordance with the speculative read threshold. In another example, a computer system includes a CPU, a memory controller, memory, a bus connecting the CPU, memory controller and memory, circuitry for providing a speculative read threshold corresponding to a selected percentage of the total number of reads that can be speculatively issued, and circuitry for intermixing demand reads and speculative reads in accordance with the speculative read threshold.
    Type: Grant
    Filed: December 10, 2007
    Date of Patent: May 24, 2011
    Assignee: International Business Machines Corporation
    Inventors: James Johnson Allen, Jr., Steven Kenneth Jenkins, James A. Mossman, Michael Raymond Trombley
  • Patent number: 7937533
    Abstract: A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design is provided. The design structure generally includes a computer system that includes a CPU, a memory controller, memory, a bus connecting the CPU, memory controller and memory, circuitry for providing a speculative read threshold corresponding to a selected percentage of the total number of reads that can be speculatively issued, and circuitry for intermixing demand reads and speculative reads in accordance with the speculative read threshold.
    Type: Grant
    Filed: May 4, 2008
    Date of Patent: May 3, 2011
    Assignee: International Business Machines Corporation
    Inventors: James J. Allen, Jr., Steven K. Jenkins, James A. Mossman, Michael R. Trombley