Operand Prefetch, E.g., Prefetch Instruction, Address Prediction (epo) Patents (Class 712/E9.047)
-
Patent number: 12230354Abstract: The present disclosure includes apparatuses and methods related to scatter/gather in a memory device. An example apparatus comprises a memory device that includes an array of memory cells, sensing circuitry, and a memory controller coupled to one another. The sensing circuitry includes a sense amplifier and a compute component configured to implement logical operations. A channel controller is configured to receive a block of instructions, the block of instructions including individual instructions for at least one of a gather operation and a scatter operation. The channel controller is configured to send individual instructions to the memory device and to control the memory controller such that the at least one of the gather operation and the scatter operation is executed on the memory device based on a corresponding one of the individual instructions.Type: GrantFiled: October 21, 2022Date of Patent: February 18, 2025Inventors: Jason T. Zawodny, Kelley D. Dobelstein, Timothy P. Finkbeiner, Richard C. Murphy
-
Patent number: 12229054Abstract: Aspects presented herein relate to methods and devices for graphics processing units including an apparatus. The apparatus may calculate a first average memory latency for the first configuration of the cache. Further, the apparatus may adjust the first configuration of the cache to a second configuration of the cache. The apparatus may calculate a second average memory latency for second configuration of the cache. Further, the apparatus may adjust the second configuration to a third configuration of the cache. The apparatus may calculate a third average memory latency for third configuration of the cache. The apparatus may output an indication of a lowest average memory latency of the first average memory latency, the second average memory latency, or a third average memory latency. Also, the apparatus may set, based on the lowest average memory latency, the cache to the first configuration, the second configuration, or a third configuration.Type: GrantFiled: March 31, 2023Date of Patent: February 18, 2025Assignee: QUALCOMM IncorporatedInventor: Suryanarayana Murthy Durbhakula
-
Patent number: 12216610Abstract: A microprocessor system comprises a computational array and a hardware data formatter. The computational array includes a plurality of computation units that each operates on a corresponding value addressed from memory. The values operated by the computation units are synchronously provided together to the computational array as a group of values to be processed in parallel. The hardware data formatter is configured to gather the group of values, wherein the group of values includes a first subset of values located consecutively in memory and a second subset of values located consecutively in memory. The first subset of values is not required to be located consecutively in the memory from the second subset of values.Type: GrantFiled: June 15, 2023Date of Patent: February 4, 2025Assignee: Tesla, Inc.Inventors: Emil Talpes, William McGee, Peter Joseph Bannon
-
Patent number: 12197335Abstract: Prefetch circuitry may be configured to transmit a message to cancel a prefetch of one or more cache blocks of a group. The message may correspond to a prefetch message by indicating an address for the group and a bit field for the one or more cache blocks of the group to cancel. In some implementations, the message may target a higher level cache to cancel prefetching the one or more cache blocks, and the message may be transmitted to the higher level cache via a lower level cache. In some implementations, the message may target a higher level cache to cancel prefetching the one or more cache blocks, the message may be transmitted to a lower level cache via a first command bus, and the lower level cache may forward the message to the higher level cache via a second command bus.Type: GrantFiled: March 13, 2023Date of Patent: January 14, 2025Assignee: SiFive, Inc.Inventors: Eric Andrew Gouldey, Wesley Waylon Terpstra, Michael Klinglesmith
-
Patent number: 12164429Abstract: Stride-based prefetcher circuits for prefetching data for next stride(s) of a cache read request into a cache memory based on identified stride patterns in the cache read request, and related processor-based systems and methods are disclosed. A stride-based prefetcher circuit (“prefetcher circuit”) observes cache read requests to a cache memory in run-time to determine if a stride pattern exists. In response to detecting a stride pattern, the prefetcher circuit prefetches data from next memory location(s) in the detected stride from higher-level memory, and loads the prefetch data into the cache memory. This is because there is a higher likelihood that when a stride in the cache read requests to the cache memory is detected to exist, subsequent cache read requests to the cache memory will more likely than not continue with the same stride. The cache hit rate of the cache memory may be increased as a result.Type: GrantFiled: August 19, 2022Date of Patent: December 10, 2024Assignee: QUALCOMM IncorporatedInventor: Suryanarayana Murthy Durbhakula
-
Patent number: 12164924Abstract: A method includes, in response to receiving an instruction to perform a first operation on first data stored in a memory device, obtaining first compression metadata from the memory device based on an address for the first data, and reducing a number of operations in a set of operations based on the first operation and one or more matching addresses, the one or more matching addresses corresponding to second compression metadata matching the first compression metadata.Type: GrantFiled: September 25, 2020Date of Patent: December 10, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Matthew Tomei, Shomit Das
-
Patent number: 12147810Abstract: A processor, an operation method, and a load-store device are provided. The processor is adapted to access a memory. The processor includes a vector register file (VRF) and the load-store device. The load-store device is coupled to the VRF. The load-store device performs a strided operation on the memory. In a current iteration of the strided operation, the load-store device reads a plurality of first data elements at a plurality of discrete addresses in the memory and writes the first data elements into the VRF, or the load-store device reads a plurality of second data elements from the VRF and writes the second data elements into a plurality of discrete addresses in the memory during the current iteration of the strided operation.Type: GrantFiled: June 22, 2022Date of Patent: November 19, 2024Assignee: ANDES TECHNOLOGY CORPORATIONInventor: Chia-Wei Hsu
-
Patent number: 12112173Abstract: An embodiment of an integrated circuit may comprise a branch predictor to predict whether a conditional branch is taken for one or more instructions, the branch predictor including circuitry to identify a loop branch instruction in the one or more instructions, and provide a branch prediction for the loop branch instruction based on a context of the loop branch instruction. Other embodiments are disclosed and claimed.Type: GrantFiled: December 21, 2020Date of Patent: October 8, 2024Assignee: Intel CorporationInventors: Ke Sun, Rodrigo Branco, Kekai Hu
-
Patent number: 12106109Abstract: The present disclosure relates to a data processing apparatus and related products. The data processing apparatus includes a decoding unit, a discrete-address determining unit, a continuous-data caching unit, a data read/write unit, and a storage unit. Through the data processing apparatus, the processing instruction may be decoded and executed. Discrete data may be transferred to a continuous data address, or continuous data may be stored to multiple discrete data addresses. As such, a vector computation of discrete data and vector data restoration after the vector computation may be implemented, which may simplify a processing process, thereby reducing data overhead.Type: GrantFiled: April 28, 2021Date of Patent: October 1, 2024Assignee: ANHUI CAMBRICON INFORMATION TECHNOLOGY CO., LTD.Inventors: Xuyan Ma, Jianhua Wu, Shaoli Liu, Xiangxuan Ge, Hanbo Liu, Lei Zhang
-
Patent number: 12093800Abstract: A device includes one or more processors configured to retrieve a first block of data, the data corresponding to array of values arranged along at least a first dimension and a second dimension, to retrieve at least a portion of a second block of the data, and to perform a first hybrid convolution operation that applies a filter across the first block and at least the portion of the second block to generate output data. The output data includes a first accumulated block and at least a portion of a second accumulated block. The one or more processors are also configured to store the first accumulated block as first output data. The portion of the second block is adjacent to the first block along the first dimension and the portion of the second accumulated block is adjacent to the first accumulated block along the second dimension.Type: GrantFiled: February 2, 2021Date of Patent: September 17, 2024Assignee: QUALCOMM IncorporatedInventor: Eric Wayne Mahurin
-
Patent number: 12056049Abstract: An out-of-order buffer includes an out-of-order queue and a controlling circuit. The out-of-order queue includes a request sequence table and a request storage device. The controlling circuit receives and temporarily stores the plural requests into the out-of-order queue. After the plural requests are transmitted to plural corresponding target devices, the controlling circuit retires the plural requests. The request sequence table contains m×n indicating units. The request sequence table contains m entry indicating rows. Each of the m entry indicating rows contains n indicating units. The request storage device includes m storage units corresponding to the m entry indicating rows in the request sequence table. The state of indicating whether one request is stored in the corresponding storage unit of the m storage units is recoded in the request sequence table. The storage sequence of the plural requests is recoded in the request sequence table.Type: GrantFiled: November 18, 2022Date of Patent: August 6, 2024Assignee: RDC SEMICONDUCTOR CO., LTD.Inventors: Jyun-Yan Li, Po-Hsiang Huang, Ya-Ting Chen, Yao-An Tsai, Shu-Wei Yi
-
Patent number: 11989254Abstract: A method, system, apparatus and product for semantic meaning association to components of digital content. The method comprising obtaining a digital content, which comprises multiple visually separated components. The method comprises analyzing at least a portion of the digital content to extract features associated with a component and automatically determining, based on the extracted features, a semantic meaning of the component. The automatic determination is performed without relying on manually inputted hints in the digital content. The method further comprises automatically and without user intervention, performing an action associated with the digital content, wherein the action is determined based on the semantic meaning.Type: GrantFiled: September 10, 2021Date of Patent: May 21, 2024Assignee: Taboola.com Ltd.Inventors: Yotam Bar On, Yonatan Schvimer, Or Yaniv, Avi Yungelson
-
Patent number: 11972264Abstract: Processing circuitry performs processing operations in response to micro-operations. Front end circuitry supplies the micro-operations to be processed by the processing circuitry. Prediction circuitry generates a prediction of a number of loop iterations for which one or more micro-operations per loop iteration are to be supplied by the front end circuitry, where an actual number of loop iterations to be processed by the processing circuitry is resolvable by the processing circuitry based on at least one operand corresponding to a first loop iteration to be processed by the processing circuitry. The front end circuitry varies, based on a level of confidence in the prediction of the number of loop iterations, a supply rate with which the one or more micro-operations for at least a subset of the loop iterations are supplied to the processing circuitry.Type: GrantFiled: June 13, 2022Date of Patent: April 30, 2024Inventors: Guillaume Bolbenes, Thibaut Elie Lanois, Houdhaifa Bouzguarrou, Luca Nassi
-
Patent number: 11928524Abstract: The computer system includes one or more storage devices and a management computer, the management computer includes an information collection unit, an event detection unit, a plan generation unit, and a plan execution unit. The plan generation unit determines a target volume of a change process of a right of control in a plan, a processor of a change source of the right of control, and a processor of a change destination of the right of control, estimates an influence by a change process of the right of control in the plan, and the plan execution unit determines execution time of the plan based on the estimation of the influence and the operation information of the storage devices. As a result, in consideration of the influence by an ownership change process, while the influence applied to usage of a computer system is suppressed, the ownership change process is executed.Type: GrantFiled: February 26, 2021Date of Patent: March 12, 2024Assignee: Hitachi, Ltd.Inventors: Tsukasa Shibayama, Kazuei Hironaka, Kenta Sato
-
Patent number: 11861759Abstract: Embodiments are generally directed to memory prefetching in multiple GPU environment. An embodiment of an apparatus includes multiple processors including a host processor and multiple graphics processing units (GPUs) to process data, each of the GPUs including a prefetcher and a cache; and a memory for storage of data, the memory including a plurality of memory elements, wherein the prefetcher of each of the GPUs is to prefetch data from the memory to the cache of the GPU; and wherein the prefetcher of a GPU is prohibited from prefetching from a page that is not owned by the GPU or by the host processor.Type: GrantFiled: January 20, 2022Date of Patent: January 2, 2024Assignee: INTEL CORPORATIONInventors: Joydeep Ray, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Nicolas Galoppo von Borries, Varghese George, Altug Koker, Elmoustapha Ould-Ahmed-Vall, Mike Macpherson, Subramaniam Maiyuran
-
Patent number: 11861193Abstract: A system and method for updating a configuration of a host system so that the memory sub-system of the host system emulates performance characteristics of a target memory sub-system. An example system includes a memory sub-system; and a processor, operatively coupled with the memory sub-system, to perform operations comprising receiving a request to emulate a characteristic of a target memory sub-system, identifying a candidate configuration that generates a load on a memory sub-system of a host system to decrease a characteristics of the memory sub-system of the host system, and updating a configuration of the host system based at least on the candidate configuration, wherein the updated configuration changes the memory sub-system of the host system to emulate the characteristic of the target memory sub-system.Type: GrantFiled: January 13, 2023Date of Patent: January 2, 2024Assignee: Micron Technology, Inc.Inventors: Jacob Mulamootil Jacob, John M. Groves, Steven Moyer
-
Patent number: 11847460Abstract: Apparatuses and methods for handling load requests are disclosed. In response to a load request specifying a data item to retrieve from memory, a series of data items comprising the data item identified by the load request are retrieved. Load requests are buffered prior to the load requests being carried out. Coalescing circuitry determines for the load request and a set of one or more other load requests buffered in the pending load buffer circuitry whether an address proximity condition is true. The address proximity condition is true when all data items identified by the set of one or more other load requests are comprised within the series of data items. When the address proximity condition is true, the set of one or more other load requests are suppressed. Coalescing prediction circuitry generates a coalescing prediction for each load request based on previous handling of load requests by the coalescing circuitry.Type: GrantFiled: March 24, 2021Date of Patent: December 19, 2023Assignee: Arm LimitedInventors: Mbou Eyole, Michiel Willem Van Tol
-
Patent number: 11797307Abstract: In response to an instruction decoder decoding a range prefetch instruction specifying first and second address-range-specifying parameters and a stride parameter, prefetch circuitry controls, depending on the first and second address-range-specifying parameters and the stride parameter, prefetching of data from a plurality of specified ranges of addresses into the at least one cache. A start address and size of each specified range is dependent on the first and second address-range-specifying parameters. The stride parameter specifies an offset between start addresses of successive specified ranges. Use of the range prefetch instruction helps to improve programmability and improve the balance between prefetch coverage and circuit area of the prefetch circuitry.Type: GrantFiled: June 23, 2021Date of Patent: October 24, 2023Assignee: Arm LimitedInventors: Krishnendra Nathella, David Hennah Mansell, Alejandro Rico Carro, Andrew Mundy
-
Patent number: 11726917Abstract: A method includes recording a first set of consecutive memory access deltas, where each of the consecutive memory access deltas represents a difference between two memory addresses accessed by an application, updating values in a prefetch training table based on the first set of memory access deltas, and predicting one or more memory addresses for prefetching responsive to a second set of consecutive memory access deltas and based on values in the prefetch training table.Type: GrantFiled: July 13, 2020Date of Patent: August 15, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Susumu Mashimo, John Kalamatianos
-
Patent number: 11620133Abstract: Systems and methods for reusing load instructions by a processor without accessing a data cache include a load store execution unit (LSU) of the processor, the LSU being configured to determine if a prior execution of a first load instruction loaded data from a first cache line of the data cache and determine if a current execution of the second load instruction will load the data from the first cache line of the data cache. Further, the LSU also determines if a reuse of the data from the prior execution of the first load instruction for the current execution of the second load instruction will lead to functional errors. If there are no functional errors, the data from the prior execution of the first load instruction is reused for the current execution of the second load instruction, without accessing the data cache for the current execution of the second load instruction.Type: GrantFiled: March 28, 2019Date of Patent: April 4, 2023Assignee: Qualcomm IncorporatedInventor: Vignyan Reddy Kothinti Naresh
-
Patent number: 11593113Abstract: Unaligned atomic memory operations on a processor using a load-store instruction set architecture (ISA) that requires aligned accesses are performed by widening the memory access to an aligned address by the next larger power of two (e.g., 4-byte access is widened to 8 bytes, and 8-byte access is widened to 16 bytes). Data processing operations supported by the load-store ISA including shift, rotate, and bitfield manipulation are utilized to modify only the bytes in the original unaligned address so that the atomic memory operations are aligned to the widened access address. The aligned atomic memory operations using the widened accesses avoid the faulting exceptions associated with unaligned access for most 4-byte and 8-byte accesses. Exception handling is performed in cases in which memory access spans a 16-byte boundary.Type: GrantFiled: October 4, 2021Date of Patent: February 28, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Darek Mihocka, Arun Upadhyaya Kishan, Pedro Miguel Sequeira De Justo Teixeira
-
Patent number: 11573739Abstract: An information processing apparatus includes: a first memory; a second memory different in processing speed from the first memory; and a processor including: a memory controller that is coupled to the first memory and the second memory and that controls an access to the first memory and an access to the second memory; and a plurality of controllers that access to the first memory or the second memory. The processor is configured to suppress a writing frequency of data into the second memory by controlling one or more first controllers that access the second memory among the plurality of controllers in accordance with a result of monitoring a state of writing the data into the second memory.Type: GrantFiled: January 25, 2021Date of Patent: February 7, 2023Assignee: FUJITSU LIMITEDInventor: Satoshi Imamura
-
Patent number: 11449429Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register specifies a circular address mode for the loop, first and second block size numbers and a circular address block size selection. For a first circular address block size selection the block size corresponds to the first block size number. For a first circular address block size selection the block size corresponds to the first block size number. For a second circular address block size selection the block size corresponds to a sum of the first block size number and the second block size number.Type: GrantFiled: April 20, 2021Date of Patent: September 20, 2022Assignee: Texas Instruments IncorporatedInventor: Joseph Zbiciak
-
Patent number: 11403082Abstract: Systems and methods are configured to receive code containing an original loop that includes irregular memory accesses. The original loop can be split. A pre-execution loop that contains code to prefetch content of the memory can be generated. Execution of the pre-execution loop can access memory inclusively between a starting location and the starting location plus a prefetch distance. A modified loop that can perform at least one computation based on the content prefetched with execution of the pre-execution loop can be generated. Execution of the main loop can to follow the execution of the pre-execution loop. The original loop can be replaced with the pre-execution loop and the modified loop.Type: GrantFiled: April 30, 2021Date of Patent: August 2, 2022Assignee: Hewlett Packard Enterprise Development LPInventors: Sanyam Mehta, Gary William Elsesser, Terry D. Greyzck
-
Patent number: 9811341Abstract: Disclosed is an apparatus and method to manage instruction cache prefetching from an instruction cache. A processor may comprise: a prefetch engine; a branch prediction engine to predict the outcome of a branch; and dynamic optimizer. The dynamic optimizer may be used to control: identifying common instruction cache misses and inserting a prefetch instruction from the prefetch engine to the instruction cache.Type: GrantFiled: December 29, 2011Date of Patent: November 7, 2017Assignee: Intel CorporationInventors: Kyriakos A. Stavrou, Enric Gibert Codina, Josep M. Codina, Crispin Gomez Requena, Antonio Gonzalez, Mirem Hyuseinova, Christos E. Kotselidis, Fernando Latorre, Pedro Lopez, Marc Lupon, Carlos Madriles Gimeno, Grigorios Magklis, Pedro Marcuello, Alejandro Martinez Vicente, Raul Martinez, Daniel Ortega, Demos Pavlou, Georgios Tournavitis, Polychronis Xekalakis
-
Patent number: 8918626Abstract: The disclosed embodiments relate to a system that executes program instructions on a processor. During a normal-execution mode, the system issues instructions for execution in program order. Upon encountering an unresolved data dependency during execution of an instruction, the system speculatively executes subsequent instructions in a lookahead mode to prefetch future loads. When an instruction retires during the lookahead mode, a working register which serves as a destination register for the instruction is not copied to a corresponding architectural register. Instead the architectural register is marked as invalid. Note that by not updating architectural registers during lookahead mode, the system eliminates the need to checkpoint the architectural registers prior to entering lookahead mode.Type: GrantFiled: November 10, 2011Date of Patent: December 23, 2014Assignee: Oracle International CorporationInventors: Yuan C. Chou, Eric W. Mahurin
-
Patent number: 8195888Abstract: Technologies are generally described for allocating available prefetch bandwidth among processor cores in a multiprocessor computing system. The prefetch bandwidth associated with an off-chip memory interface of the multiprocessor may be determined, partitioned, and allocated across multiple processor cores.Type: GrantFiled: March 20, 2009Date of Patent: June 5, 2012Assignee: Empire Technology Development LLCInventor: Yan Solihin
-
Patent number: 8156286Abstract: A microprocessor includes a cache memory, a prefetch unit, and detection logic. The prefetch unit may be configured to monitor memory accesses that miss in the cache and to determine whether to prefetch one or more blocks of memory from a system memory based upon previous memory accesses. The prefetch unit may be further configured to use addresses of the memory accesses that miss to calculate each next memory block to prefetch. The detection logic may be configured to provide a notification to the prefetch unit in response to detecting a memory access instruction including a particular hint. In response to receiving the notification, the prefetch unit may be configured to inhibit using an address associated with the memory access instruction including the particular hint, when calculating subsequent memory blocks to prefetch.Type: GrantFiled: December 30, 2008Date of Patent: April 10, 2012Assignee: Advanced Micro Devices, Inc.Inventor: Thomas M. Deneau
-
Patent number: 8032713Abstract: A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design is provided. The design structure generally includes a computer system that includes a CPU, a storage device, circuitry for providing a speculative access threshold corresponding to a selected percentage of the total number of accesses to the storage device that can be speculatively issued, and circuitry for intermixing demand accesses and speculative accesses in accordance with the speculative access threshold.Type: GrantFiled: May 5, 2008Date of Patent: October 4, 2011Assignee: International Business Machines CorporationInventors: James J. Allen, Jr., Steven K. Jenkins, James A. Mossman, Michael R. Trombley
-
Patent number: 7949830Abstract: A system and method for handling speculative read requests for a memory controller in a computer system are provided. In one example, a method includes the steps of providing a speculative read threshold corresponding to a selected percentage of the total number of reads that can be speculatively issued, and intermixing demand reads and speculative reads in accordance with the speculative read threshold. In another example, a computer system includes a CPU, a memory controller, memory, a bus connecting the CPU, memory controller and memory, circuitry for providing a speculative read threshold corresponding to a selected percentage of the total number of reads that can be speculatively issued, and circuitry for intermixing demand reads and speculative reads in accordance with the speculative read threshold.Type: GrantFiled: December 10, 2007Date of Patent: May 24, 2011Assignee: International Business Machines CorporationInventors: James Johnson Allen, Jr., Steven Kenneth Jenkins, James A. Mossman, Michael Raymond Trombley
-
Patent number: 7937533Abstract: A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design is provided. The design structure generally includes a computer system that includes a CPU, a memory controller, memory, a bus connecting the CPU, memory controller and memory, circuitry for providing a speculative read threshold corresponding to a selected percentage of the total number of reads that can be speculatively issued, and circuitry for intermixing demand reads and speculative reads in accordance with the speculative read threshold.Type: GrantFiled: May 4, 2008Date of Patent: May 3, 2011Assignee: International Business Machines CorporationInventors: James J. Allen, Jr., Steven K. Jenkins, James A. Mossman, Michael R. Trombley