With Dedicated Cache, E.g., Instruction Or Stack, Etc. (epo) Patents (Class 711/E12.02)
  • Patent number: 11194574
    Abstract: An apparatus is described, comprising load issuing circuitry configured to issue load operations to load data from memory, and memory ordering tracking storage circuitry configured to store memory ordering tracking information on issued load operations. The apparatus also includes control circuitry configured to access the memory ordering tracking storage circuitry to determine, using the memory ordering tracking information, whether at least one load operation has been issued in disagreement with a memory ordering requirement, and, if so, to determine whether to re-issue one or more issued load operations or to continue issuing load operations despite disagreement with the memory ordering requirement. Furthermore, the control circuitry is capable of merging the memory ordering tracking information for a plurality of issued load operations into a merged entry in the memory ordering tracking storage circuitry.
    Type: Grant
    Filed: July 25, 2019
    Date of Patent: December 7, 2021
    Assignee: Arm Limited
    Inventors: Miles Robert Dooley, Balaji Vijayan, Huzefa Moiz Sanjeliwala, Abhishek Raja, Sharmila Shridhar
  • Patent number: 11099846
    Abstract: A method and apparatus generates control information that indicates whether to change a counter value associated with a particular load instruction. In response to the control information, the method and apparatus causes a hysteresis effect for operating between a speculative mode and a non-speculative mode based on the counter value. The hysteresis effect is in favor of the non-speculative mode. The method and apparatus causes the hysteresis effect by incrementing the counter value associated with the particular load instruction by a first value or decrementing the counter value by a second value. The first value is greater than the second value.
    Type: Grant
    Filed: June 20, 2018
    Date of Patent: August 24, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Krishnan V. Ramani, Chetana N. Keltcher
  • Patent number: 10929142
    Abstract: Provided are embodiments including a computer-implemented method, system and computer program product for determining precise operand-store-compare (OSC) predictions to avoid false dependencies. Some embodiments include detecting an instruction causing an OSC event, wherein the OSC event is at least one of a store-hit-load event or a load-hit-store event, marking an entry in a queue for the instruction based on the detected OSC event, wherein marking the entry comprises setting a bit and saving a tag in the entry in the queue. Some embodiments also include installing an address for the instruction and the tag in the history table responsive to completing the instruction.
    Type: Grant
    Filed: March 20, 2019
    Date of Patent: February 23, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gregory William Alexander, James Bonanno, Adam Collura, James Raymond Cuffney, Yair Fried, Jonathan Hsieh, Jang-Soo Lee, Edward Malley, Anthony Saporito, Eyal Naor
  • Patent number: 10908902
    Abstract: Examples of techniques for distance-based branch prediction are disclosed. In one example implementation according to aspects of the present disclosure, a computer-implemented method includes: determining, by a processing system, a potential return instruction address (IA) by determining whether a relationship is satisfied between a first target IA and a first branch IA; storing a second branch IA as a return when a target IA of a second branch matches a potential return IA for the second branch; and applying the potential return IA for the second branch as a predicted target IA of a predicted branch IA stored as a return.
    Type: Grant
    Filed: May 26, 2016
    Date of Patent: February 2, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: James J. Bonanno, Michael J. Cadigan, Jr., Adam B. Collura, Daniel Lipetz, Brian R. Prasky
  • Patent number: 10620881
    Abstract: An apparatus includes an interface for dynamic random access memory (DRAM); and an integrated circuit. The integrated circuit includes a memory pinout configured to connect to the memory and control logic. The control logic is configured multiplex address information, command information, and data to be written to or read from the DRAM memory on a subset of pins of the memory pinout to the DRAM memory. The control logic is further configured to route other signals on other pins of the memory pinout to the DRAM in parallel with the multiplexed address information, command information, and data information.
    Type: Grant
    Filed: April 23, 2018
    Date of Patent: April 14, 2020
    Assignee: MICROCHIP TECHNOLOGY INCORPORATED
    Inventors: Eric Matulik, Patrick Filippi, Marc Maunier
  • Patent number: 10162755
    Abstract: A technique for operating a cache memory of a data processing system includes creating respective pollution vectors to track which of multiple concurrent threads executed by an associated processor core are currently polluted by a store operation resident in the cache memory. Dependencies in a dependency data structure of a store queue of the cache memory are set based on the pollution vectors to reduce unnecessary ordering effects. Store operations are dispatched from the store queue in accordance with the dependencies indicated by the dependency data structure.
    Type: Grant
    Filed: October 31, 2016
    Date of Patent: December 25, 2018
    Assignee: International Business Machines Corporation
    Inventors: Guy L. Guthrie, Hugh Shen, William J. Starke, Derek E. Williams
  • Patent number: 10162642
    Abstract: An instruction cache and data cache used to virtualize the storage of global data and instructions used by graphics shaders. Present day hardware design stores the global data and instructions used by the shaders in a fixed amount of registers or writable control store (WCS). However, this traditional approach limits the size and the complexity of the shaders that can be supported. By virtualizing the storage of the global data and instructions, the amount of global or state memory available to the shader and the length of the shading programs are no longer constrained by the physical on-chip memory.
    Type: Grant
    Filed: February 4, 2014
    Date of Patent: December 25, 2018
    Assignee: ZiiLABS Inc. Ltd.
    Inventor: David R. Baldwin
  • Patent number: 9870469
    Abstract: In an example, a stack protection engine is disclosed for preventing or ameliorating stack corruption attacks. The stack protection engine may operate transparently to user-space processes. After a call to a subroutine from a parent routine, the stack protection engine encodes the return address on the stack, such as with an exclusive or cipher and a key selected from a key array. After the subroutine returns control to the main routine, the stack protection engine decodes the address, and returns control. If a stack corruption attack occurs, the malicious return address is not properly encoded, so that when decoding occurs, the program may simply crash rather than returning control to the malicious code.
    Type: Grant
    Filed: September 26, 2014
    Date of Patent: January 16, 2018
    Assignee: McAfee, Inc.
    Inventor: Simon Crowe
  • Patent number: 9563558
    Abstract: A technique for operating a cache memory of a data processing system includes creating respective pollution vectors to track which of multiple concurrent threads executed by an associated processor core are currently polluted by a store operation resident in the cache memory. Dependencies in a dependency data structure of a store queue of the cache memory are set based on the pollution vectors to reduce unnecessary ordering effects. Store operations are dispatched from the store queue in accordance with the dependencies indicated by the dependency data structure.
    Type: Grant
    Filed: August 28, 2014
    Date of Patent: February 7, 2017
    Assignee: International Business Machines Corporation
    Inventors: Guy L. Guthrie, Hugh Shen, William J. Starke, Derek E. Williams
  • Patent number: 9514045
    Abstract: A technique for operating a cache memory of a data processing system includes creating respective pollution vectors to track which of multiple concurrent threads executed by an associated processor core are currently polluted by a store operation resident in the cache memory. Dependencies in a dependency data structure of a store queue of the cache memory are set based on the pollution vectors to reduce unnecessary ordering effects. Store operations are dispatched from the store queue in accordance with the dependencies indicated by the dependency data structure.
    Type: Grant
    Filed: April 4, 2014
    Date of Patent: December 6, 2016
    Assignee: International Business Machines Corporation
    Inventors: Guy L. Guthrie, Hugh Shen, William J. Starke, Derek E. Williams
  • Patent number: 9043553
    Abstract: Various technologies and techniques are disclosed for using transactional memory hardware to accelerate virtualization or emulation. State isolation can be facilitated by providing isolated private state on transactional memory hardware and storing the stack of a host that is performing an emulation in the isolated private state. Memory accesses performed by a central processing unit can be monitored by software to detect that a guest being emulated has made a self modification to its own code sequence. Transactional memory hardware can be used to facilitate dispatch table updates in multithreaded environments by taking advantage of the atomic commit feature. An emulator is provided that uses a dispatch table stored in main memory to convert a guest program counter into a host program counter. The dispatch table is accessed to see if the dispatch table contains a particular host program counter for a particular guest program counter.
    Type: Grant
    Filed: June 27, 2007
    Date of Patent: May 26, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Martin Taillefer, Darek Mihocka, Bruno Silva
  • Patent number: 8954698
    Abstract: Memory is dynamically switched through the optical-switching fabric using at least one communication pattern to transfer memory space in the memory blades from one processor to an alternative processor in the processor blades without physically copying data in the memory to the processors. Various communication patterns for the dynamically switching are supported.
    Type: Grant
    Filed: April 13, 2012
    Date of Patent: February 10, 2015
    Assignee: International Business Machines Corporation
    Inventors: Eugen Schenfeld, Abhirup Chakraborty
  • Patent number: 8935464
    Abstract: A system including an interface module to interface a solid-state disk controller to a computing device. A memory control module exchanges data with the computing device via the interface module and caches the data in a solid-state memory controlled by the solid-state disk controller. A network interface module communicates with the computing device via the interface module and interfaces the computing device to a wireless network. A crossbar module has a master bus (Mbus) interface bridged to an advanced high-performance bus (AHB). A memory communicates with one or more of the network interface module and the crossbar module via one or more of the Mbus interface and the AHB. In response to data being cached from the computing device to the solid-state memory or data cached in the solid-state memory being output to the computing device, the network interface module buffers data received from the wireless network in the memory.
    Type: Grant
    Filed: April 30, 2014
    Date of Patent: January 13, 2015
    Assignee: Marvell World Trade Ltd.
    Inventors: Sehat Sutardja, Po-Chien Chang, Roawen Chen
  • Patent number: 8914580
    Abstract: In some embodiments, a cache may include a tag array and a data array, as well as circuitry that detects whether accesses to the cache are sequential (e.g., occupying the same cache line). For example, a cache may include a tag array and a data array that stores data, such as multiple bundles of instructions per cache line. During operation, it may be determined that successive cache requests are sequential and do not cross a cache line boundary. Responsively, various cache operations may be inhibited to conserve power. For example, access to the tag array and/or data array, or portions thereof, may be inhibited.
    Type: Grant
    Filed: August 23, 2010
    Date of Patent: December 16, 2014
    Assignee: Apple Inc.
    Inventors: Rajat Goel, Ian D. Kountanis
  • Patent number: 8850121
    Abstract: A load/store unit with an outstanding load miss buffer and a load miss result buffer is configured to read data from a memory system having a level one cache. Missed load instructions are stored in the outstanding load miss buffer. The load/store unit retrieves data for multiple dependent missed load instructions using a single cache access and stores the data in the load miss result buffer. The outstanding load miss buffer stores a first missed load instruction in a first primary entry. Additional missed load instructions that are dependent on the first missed load instructions are stored in dependent entries of the first primary entry or in shared entries. If a shared entry is used for a missed load instruction the shared entry is associated with the primary entry.
    Type: Grant
    Filed: September 30, 2011
    Date of Patent: September 30, 2014
    Assignee: Applied Micro Circuits Corporation
    Inventors: Matthew W. Ashcraft, John Gregory Favor, David A. Kruckemyer
  • Patent number: 8838934
    Abstract: A system and method for providing high-speed memory operations is disclosed. The technique uses virtualization of memory space to map a virtual address space to a larger physical address space wherein no memory bank conflicts will occur. The larger physical address space is used to prevent memory bank conflicts from occurring by moving the virtualized memory addresses of data being written to memory to a different location in physical memory that will eliminate a memory bank conflict. To improve memory performance destructive read operations are used when reading data but the data is written back into the physical memory in a later cycle.
    Type: Grant
    Filed: April 18, 2013
    Date of Patent: September 16, 2014
    Assignee: Memoir Systems, Inc.
    Inventors: Sundar Iver, Shang-Tse Chuang
  • Patent number: 8832389
    Abstract: Domains can also be used to control access to physical memory space. Data in a physical memory space that has been used by a process sometimes endures after the process stops using the physical memory space (e.g., the process terminates). In addition, a virtual memory manager may allow processes of different applications to access a same memory space. To prevent exposure of sensitive/confidential data, physical memory spaces can be designated for a specific domain or domains when the physical memory spaces are allocated.
    Type: Grant
    Filed: January 14, 2011
    Date of Patent: September 9, 2014
    Assignee: International Business Machines Corporation
    Inventors: Saurabh Desai, George Mathew Koikara, Pruthvi Panyam Nataraj, Guha Prasad Venkataraman, Vidya Ranganathan
  • Patent number: 8825958
    Abstract: A digital system is provided for high-performance cache systems. The digital system includes a processor core and a cache control unit. The processor core is capable of being coupled to a first memory containing executable instructions and a second memory with a faster speed than the first memory. Further, the processor core is configured to execute one or more instructions of the executable instructions from the second memory. The cache control unit is configured to be couple to the first memory, the second memory, and the processor core to fill at least the one or more instructions from the first memory to the second memory before the processor core executes the one or more instructions.
    Type: Grant
    Filed: August 8, 2013
    Date of Patent: September 2, 2014
    Assignee: Shanghai Xin Hao Micro Electronics Co. Ltd.
    Inventors: Kenneth Chenghao Lin, Haoqi Ren
  • Patent number: 8762644
    Abstract: A particular method includes loading one or more memory images into a multi-way cache. The memory images are associated with an audio decoder, and the multi-way cache is accessible to a processor. Each of the memory images is sized not to exceed a page size of the multi-way cache.
    Type: Grant
    Filed: February 25, 2011
    Date of Patent: June 24, 2014
    Assignee: QUALCOMM Incorporated
    Inventor: Michael Warren Castelloe
  • Patent number: 8751747
    Abstract: A method for managing cache memory including receiving an instruction fetch for an instruction stream in a cache memory, wherein the instruction fetch includes an instruction fetch reference tag for the instruction stream and the instruction stream is at least partially included within a cache line, comparing the instruction fetch reference tag to a previous instruction fetch reference tag, maintaining a cache replacement status of the cache line if the instruction fetch reference tag is the same as the previous instruction fetch reference tag, and upgrading the cache replacement status of the cache line if the instruction fetch reference tag is different from the previous instruction fetch reference tag, whereby the cache replacement status of the cache line is upgraded if the instruction stream is independently fetched more than once. A corresponding system and computer program product.
    Type: Grant
    Filed: February 26, 2008
    Date of Patent: June 10, 2014
    Assignee: International Business Machines Corporation
    Inventors: Robert J. Sonnelitter, III, Gregory W. Alexander, Brian R. Prasky
  • Patent number: 8731688
    Abstract: A processing apparatus, which contains a processor that executes a program includes a series of instructions, includes a log recording unit configured to record an operation log of the processing apparatus; a managing unit configured to control a recording operation performed by the log recording unit and read the operation log recorded in the log recording unit; an input unit configured to detect, from among the series of instructions of the executed program; a start instruction that starts a process for delivering a control instruction destined for the managing unit to the managing unit and deliver the control instruction to the managing unit in response to the start instruction; and an output unit configured to receive the operation log read by the managing unit.
    Type: Grant
    Filed: March 17, 2010
    Date of Patent: May 20, 2014
    Assignee: Fujitsu Limited
    Inventors: Iwao Yamazaki, Michiharu Hara, Eiji Yamanaka
  • Publication number: 20140136786
    Abstract: A processor includes a processor core, a cache, and a tracker. The processor core is configured to execute persistent write instructions and receive notifications of completed persistent write instructions. The tracker is configured to track the completion state of a persistent write instruction.
    Type: Application
    Filed: November 13, 2012
    Publication date: May 15, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gary D. Carpenter, Stefanie R. Chiras, Alexandre P. Ferreira, Jente B. Kuang, Karthick Rajamani, Freeman L. Rawson, III
  • Patent number: 8719485
    Abstract: A solid-state disk (SSD) controller includes a first integrated circuit (IC) that includes an interface module, a memory control module, and a wireless network interface module. The interface module externally interfaces the SSD controller to a computing device. The memory control module controls solid-state memory, receives data from the computing device via the interface module, and caches the data in the solid-state memory. The wireless network interface module communicates with the computing device via the interface module and allows the computing device to connect to a wireless network.
    Type: Grant
    Filed: March 11, 2009
    Date of Patent: May 6, 2014
    Assignee: Marvell World Trade Ltd.
    Inventors: Sehat Sutardja, Po-Chien Chang, Roawen Chen
  • Publication number: 20140115257
    Abstract: A processor stores branch information at a “sparse” cache and a “dense” cache. The sparse cache stores the target addresses for up to a specified number of branch instructions in a given cache entry associated with a cache line address, while branch information for additional branch instructions at the cache entry is stored at the dense cache. Branch information at the dense cache persists after eviction of the corresponding cache line until it is replaced by branch information for a different cache entry. Accordingly, in response to the instructions for a given cache line address being requested for retrieval from memory, a prefetcher determines whether the dense cache stores branch information for the cache line address. If so, the prefetcher prefetches the instructions identified by the target addresses of the branch information in the dense cache concurrently with transferring the instructions associated with the cache line address.
    Type: Application
    Filed: October 22, 2012
    Publication date: April 24, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventor: James D. Dundas
  • Publication number: 20140089588
    Abstract: Method and system of storing data by a software application. Each read query of a data storage system by a software application is first solely issued to a plurality of cache nodes, which returns the queried data if available. If not available, the software application receives a miss that triggers a fetch of the queried data from one or more database systems on a first dedicated interface. Upon having retrieved the queried data, the software application adds the queried data to at least one cache node on a second dedicated. Each writing of the one or more database systems by the software application is also concurrently performed in the at least one cache node. Hence, population of the at least one cache node is quickly done at each missed read query of the at least one cache node and at each write query of the data storage system.
    Type: Application
    Filed: September 27, 2012
    Publication date: March 27, 2014
    Applicant: AMADEUS S.A.S.
    Inventors: Jean-Charles Redoutey, Joel Singer, Florent Balard, Florian Prud'homme, Romain Bouteloup, Colin Pitrat
  • Publication number: 20140089589
    Abstract: Methods and processors for enforcing an order of memory access requests in the presence of barriers in an out-of-order processor pipeline. A speculative color is assigned to instruction operations in the front-end of the processor pipeline, while the instruction operations are still in order. The instruction operations are placed in any of multiple reservation stations and then issued out-of-order from the reservation stations. When a barrier is encountered in the front-end, the speculative color is changed, and instruction operations are assigned the new speculative color. A core interface unit maintains an architectural color, and the architectural color is changed when a barrier retires. The core interface unit stalls instruction operations with a speculative color that does match the architectural color.
    Type: Application
    Filed: September 27, 2012
    Publication date: March 27, 2014
    Applicant: APPLE INC.
    Inventors: Stephan G. Meier, Gerard R. Williams, III
  • Publication number: 20140025893
    Abstract: Execution of non-native operating system images within a virtualized computer system is improved by providing a mechanism for retrieving translated code physical addresses corresponding to un-translated code branch target addresses using a host code map. Hardware acceleration mechanisms, such as content-accessible look-up tables, directory hardware, or processor instructions that operate on tables in memory can be provided to accelerate the performance of the translation mechanism. The virtual address of the branch instruction target is used as a key to look up a corresponding record that contains a physical address of the translated code page containing the translated branch instruction target, and execution is directed to the physical address obtained from the record, once the physical page containing the translated code corresponding the target address is loaded in memory.
    Type: Application
    Filed: July 20, 2012
    Publication date: January 23, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Alexander Barraclough Brown
  • Publication number: 20130346698
    Abstract: A data processing apparatus and method, the apparatus including processing circuitry for executing a sequence of instructions, each instruction having an associated memory address and the sequence of instructions including cacheable instructions whose associated memory addresses are within a cacheable memory region. Instruction cache control circuitry is arranged to store within a selected cache line of a data storage the instruction data for a plurality of cacheable instructions as retrieved from memory, to store within the tag entry associated with that selected cache line the address identifier for that stored instruction data, and to identify that selected cache line as valid within the valid flag storage. Control state circuitry maintains a record of the chosen cache line in which said data of a predetermined data type has been written, so that upon receipt of a request for that data it can then be provided from the instruction cache.
    Type: Application
    Filed: June 26, 2012
    Publication date: December 26, 2013
    Applicant: ARM LIMITED
    Inventors: Alex James Waugh, Matthew Lee Winrow
  • Publication number: 20130304993
    Abstract: Systems and methods are disclosed for maintaining an instruction cache including extended cache lines and page attributes for main cache line portions of the extended cache lines and, at least for one or more predefined potential page-crossing instruction locations, additional page attributes for extra data portions of the corresponding extended cache lines. In addition, systems and methods are disclosed for processing page-crossing instructions fetched from an instruction cache having extended cache lines.
    Type: Application
    Filed: June 28, 2012
    Publication date: November 14, 2013
    Applicant: QUALCOMM INCORPORATED
    Inventors: Leslie Mark DeBruyne, James Norris Dieffenderfer, Michael Scott McIlvaine, Brian Michael Stempel
  • Publication number: 20130290640
    Abstract: In one embodiment, a microprocessor is provided. The microprocessor includes instruction memory and a branch prediction unit. The branch prediction unit is configured to use information from the instruction memory to selectively power up the branch prediction unit from a powered-down state when fetched instruction data includes a branch instruction and maintain the branch prediction unit in the powered-down state when the fetched instruction data does not include a branch instruction in order to reduce power consumption of the microprocessor during instruction fetch operations.
    Type: Application
    Filed: April 27, 2012
    Publication date: October 31, 2013
    Applicant: NVIDIA CORPORATION
    Inventors: Aneesh Aggarwal, Ross Segelken, Kevin Koschoreck, Paul Wasson
  • Publication number: 20130290639
    Abstract: A processor uses a dedicated buffer to reduce the amount of time needed to execute memory copy operations. For each load instruction associated with the memory copy operation, the processor copies the load data from memory to the dedicated buffer. For each store operation associated with the memory copy operation, the processor retrieves the store data from the dedicated buffer and transfers it to memory. The dedicated buffer is separate from a register file and caches of the processor, so that each load operation associated with a memory copy operation does not have to wait for data to be loaded from memory to the register file. Similarly, each store operation associated with a memory copy operation does not have to wait for data to be transferred from the register file to memory.
    Type: Application
    Filed: April 25, 2012
    Publication date: October 31, 2013
    Applicant: FREESCALE SEMICONDUCTOR, INC.
    Inventors: Thang M. Tran, James Yang
  • Publication number: 20130282950
    Abstract: A method for selectively placing cache data, comprising the steps of (A) determining a line temperature for a plurality of devices, (B) determining a device temperature for the plurality of devices, (C) calculating an entry temperature for the plurality of devices in response to the cache line temperature and the device temperature and (D) distributing a plurality of write operations across the plurality of devices such that thermal energy is distributed evenly over the plurality of devices.
    Type: Application
    Filed: April 24, 2012
    Publication date: October 24, 2013
    Inventors: Luca Bert, Mark Ish, Rajiv Ganth Rajaram
  • Publication number: 20130238859
    Abstract: Disclosed are a cache with a scratch pad memory (SPM) structure and a processor including the same. The cache with a scratch pad memory structure includes: a block memory configured to include at least one block area in which instruction codes read from an external memory are stored; a tag memory configured to store an external memory address corresponding to indexes of the instruction codes stored in the block memory; and a tag controller configured to process a request from a fetch unit for the instruction codes, wherein a part of the block areas is set as a SPM area according to cache setting input from a cache setting unit. According to the present invention, it is possible to reduce the time to read instruction codes from the external memory and realize power saving by operating the cache as the scratch pad memory.
    Type: Application
    Filed: November 19, 2012
    Publication date: September 12, 2013
    Applicant: Electronics and Telecommunications Research Institute
    Inventor: Jin Ho HAN
  • Patent number: 8527707
    Abstract: A digital system is provided for high-performance cache systems. The digital system includes a processor core and a cache control unit. The processor core is capable of being coupled to a first memory containing executable instructions and a second memory with a faster speed than the first memory. Further, the processor core is configured to execute one or more instructions of the executable instructions from the second memory. The cache control unit is configured to be couple to the first memory, the second memory, and the processor core to fill at least the one or more instructions from the first memory to the second memory before the processor core executes the one or more instructions.
    Type: Grant
    Filed: December 22, 2010
    Date of Patent: September 3, 2013
    Assignee: Shanghai Xin Hao Micro Electronics Co. Ltd.
    Inventors: Kenneth Chenghao Lin, Haoqi Ren
  • Publication number: 20130173837
    Abstract: Methods and apparatus are provided for implementing a lightweight notification (LN) protocol in the PCI Express base specification which allows an endpoint function associated with a PCI Express device to register interest in one or more cachelines in host memory, and to request an LN notification message from the CPU/memory complex when the content of a registered cacheline changes. The LN notification message can be unicast to a single endpoint using ID-based routing, or broadcast to all devices on a given root port. The LN protocol may be implemented in the CPU complex by configuring a queue or other data structure in system memory for LN use. An endpoint registers a notification request by setting the LN bit in a “read” request of an LN configured cacheline.
    Type: Application
    Filed: December 30, 2011
    Publication date: July 4, 2013
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Stephen D. Glaser, Mark D. Hummel
  • Patent number: 8473683
    Abstract: A parallel computing system processes at least one store instruction. A first processor core issues a store instruction. A first queue, associated with the first processor core, stores the store instruction. A second queue, associated with a first local cache memory device of the first processor core, stores the store instruction. The first processor core updates first data in the first local cache memory device according to the store instruction. The third queue, associated with at least one shared cache memory device, stores the store instruction. The first processor core invalidates second data, associated with the store instruction, in the at least one shared cache memory. The first processor core invalidates third data, associated with the store instruction, in other local cache memory devices of other processor cores. The first processor core flushing only the first queue.
    Type: Grant
    Filed: January 7, 2011
    Date of Patent: June 25, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alan Gara, Martin Ohmacht
  • Patent number: 8473682
    Abstract: According to one embodiment, a cache unit transferring data from a memory connected to the cache unit via a bus incompatible with a critical word first (CWF) to an L1-cache having a first line size and connected to the cache unit via a bus compatible with the CWF. The unit includes cache and un-cache controllers. The cache controller includes an L2-cache and a request converter. The L2-cache has a second line size greater than or equal to the first line size. The request converter converts a first refill request into a second refill request when a head address of a burst transfer of the first refill request is in the L2-cache. The un-cache controller transfers the second refill request to the memory, receives data to be processed corresponding to the second refill request from the memory, and transfers the received data to the L1-cache.
    Type: Grant
    Filed: November 24, 2010
    Date of Patent: June 25, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Soichiro Hosoda
  • Publication number: 20130159628
    Abstract: Methods and apparatus for source operand collector caching. In one embodiment, a processor includes a register file that may be coupled to storage elements (i.e., an operand collector) that provide inputs to the datapath of the processor core for executing an instruction. In order to reduce bandwidth between the register file and the operand collector, operands may be cached and reused in subsequent instructions. A scheduling unit maintains a cache table for monitoring which register values are currently stored in the operand collector. The scheduling unit may also configure the operand collector to select the particular storage elements that are coupled to the inputs to the datapath for a given instruction.
    Type: Application
    Filed: December 14, 2011
    Publication date: June 20, 2013
    Inventors: Jack Hilaire CHOQUETTE, Manuel Olivier Gautho, John Erik Lindholm
  • Publication number: 20130145210
    Abstract: For a flexible replication with skewed mapping in a multi-core chip, a request for a cache line is received, at a receiver core in the multi-core chip from a requester core in the multi-core chip. The receiver and requester cores comprise electronic circuits. The multi-core chip comprises a set of cores including the receiver and the requester cores. A target core is identified from the request to which the request is targeted. A determination is made whether the target core includes the requester core in a neighborhood of the target core, the neighborhood including a first subset of cores mapped to the target core according to a skewed mapping. The cache line is replicated, responsive to the determining being negative, from the target core to a replication core. The cache line is provided from the replication core to the requester core.
    Type: Application
    Filed: December 1, 2011
    Publication date: June 6, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jian Li, William Evan Speight
  • Publication number: 20130138888
    Abstract: A control transfer instruction (CTI), such as a branch, jump, etc., may have an offset value for a control transfer that is to be performed. The offset value may be usable to compute a target address for the CTI (e.g., the address of a next instruction to be executed for a thread or instruction stream). The offset may be specified relative to a program counter. In response to detecting a specified offset value, the CTI may be modified to include at least a portion of a computed target address. Information indicating this modification has been performed may be stored, for example, in a pre-decode bit. In some cases, CTI modification may be performed only when a target address is a “near” target, rather than a “far” target. Modifying CTIs as described herein may eliminate redundant address calculations and produce a savings of power and/or time in some embodiments.
    Type: Application
    Filed: November 30, 2011
    Publication date: May 30, 2013
    Inventors: Jama I. Barreh, Manish K. Shah, Christopher H. Olson
  • Publication number: 20130132676
    Abstract: The present invention extends to methods, systems, and computer program products for asynchronously binding data from a data source to a data target. A user interface thread and a separate thread are used to enable the user interface thread to continue execution rather than blocking to obtain updated data, to which elements of a user interface that the user interface thread is managing, are bound. The separate thread obtains updated data from a data source, stores the updated data in a local cache, and notifies the user interface thread of the updated data's presence in the local cache.
    Type: Application
    Filed: November 21, 2011
    Publication date: May 23, 2013
    Applicant: Microsoft Corporation
    Inventors: Akhilesh Kaza, Shawn Patrick Burke
  • Publication number: 20130124801
    Abstract: A technique to track a host controller cache that includes receiving from a host controller a command indicating whether a cache of the host controller has data which is to be stored to a storage system. In the event that the host controller fails, perform an operation to transfer control from the host controller to another host controller based on whether the command indicates that the data of the cache was stored to the storage system.
    Type: Application
    Filed: November 16, 2011
    Publication date: May 16, 2013
    Inventors: Balaji Natrajan, Michael G. Myrah
  • Publication number: 20130117510
    Abstract: In order to optimize efficiency of deserialization, a serialization cache is maintained at an object server. The serialization cache is maintained in conjunction with an object cache and stores serialized forms of objects cached within the object cache. When an inbound request is received, a serialized object received in the request is compared to the serialization cache. If the serialized byte stream is present in the serialization cache, then the equivalent object is retrieved from the object cache, thereby avoiding deserialization of the received serialized object. If the serialized byte stream is not present in the serialization cache, then the serialized byte stream is deserialized, the deserialized object is cached in the object cache, and the serialized object is cached in the serialization cache.
    Type: Application
    Filed: August 29, 2012
    Publication date: May 9, 2013
    Applicant: RECURSION SOFTWARE, INC.
    Inventors: Deren George Ebdon, Robert W. Peterson
  • Publication number: 20130117627
    Abstract: An method of operating a data cache controller is provided. The method includes transmitting first data output from a data cache to a central processing unit (CPU) core with a first latency and transmitting second data to the CPU core with a second latency greater than the first latency. The first latency is a delay between a read request to the data cache and transmission of the first data according to execution of a first instruction fetched from an instruction cache, and the second latency is a delay between a read request to the data cache and transmission of the second data according to execution of a second instruction fetched from the instruction cache.
    Type: Application
    Filed: April 13, 2012
    Publication date: May 9, 2013
    Inventors: Sung Hyun Lee, Jun Hee Yoo
  • Publication number: 20130111140
    Abstract: A cache memory apparatus according to the present invention includes a cache memory that caches an instruction code corresponding to a fetch address and a cache control circuit that controls the instruction code to be cached in the cache memory. The cache control circuit caches an instruction code corresponding to a subroutine when the fetch address indicates a branch into the subroutine and disables the instruction code to be cached when the number of the instruction codes to be cached exceeds a previously set maximum number.
    Type: Application
    Filed: November 2, 2012
    Publication date: May 2, 2013
    Applicant: RENESAS ELECTRONICS CORPORATION
    Inventor: Renesas Electronics Corporation
  • Publication number: 20130086327
    Abstract: An automatic caching system is described herein that automatically determines user-relevant points at which to incrementally cache expensive to obtain data, resulting in faster computation of dependent results. The system can intelligently choose between caching data locally and pushing computation to a remote location collocated with the data, resulting in faster computation of results. The automatic caching system uses stable keys to uniquely refer to programmatic identifiers. The system annotates programs before execution with additional code that utilizes the keys to associate and cache intermediate programmatic results. The system can maintain the cache in a separate process or even on a separate machine to allow cached results to outlive program execution and allow subsequent execution to utilize previously computed results. Cost estimations are performed in order to choose whether utilizing cached values or remote execution would result in a faster computation of a result.
    Type: Application
    Filed: November 10, 2011
    Publication date: April 4, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Michael Coulson, Gregory Hughes
  • Publication number: 20130080706
    Abstract: A classloader cache class definition is obtained by a processor. The classloader cache class definition includes code that creates a classloader object cache that is referenced by a strong internal reference by a classloader object in response to instantiation of the classloader cache class definition. A classloader object cache is instantiated using the obtained classloader cache class definition. The strong internal reference is created at instantiation of the classloader object cache. A public interface to the classloader object cache is provided. The public interface to the classloader object cache operates as a weak reference to the classloader object cache and provides external access to the classloader object cache.
    Type: Application
    Filed: September 23, 2011
    Publication date: March 28, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Vijay Sundaresan, Andres H. Voldman
  • Publication number: 20130080707
    Abstract: A classloader cache class definition is obtained by a processor. The classloader cache class definition includes code that creates a classloader object cache that is referenced by a strong internal reference by a classloader object in response to instantiation of the classloader cache class definition. A classloader object cache is instantiated using the obtained classloader cache class definition. The strong internal reference is created at instantiation of the classloader object cache. A public interface to the classloader object cache is provided. The public interface to the classloader object cache operates as a weak reference to the classloader object cache and provides external access to the classloader object cache.
    Type: Application
    Filed: March 29, 2012
    Publication date: March 28, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Vijay Sundaresan, Andres H. Voldman
  • Publication number: 20130054869
    Abstract: Example methods, apparatus, and articles of manufacture to access data are disclosed. A disclosed example method involves generating a key-value association table in a non-volatile memory to store physical addresses of a data cache storing data previously retrieved from a data structure. The example method also involves storing recovery metadata in the non-volatile memory. The recovery metadata includes a first address of the key-value association table in the non-volatile memory. In addition, following a re-boot process, the locations of the key-value association table and the data cache are retrieved using the recovery metadata without needing to access the data structure to re-generate the key-value association table and the data cache.
    Type: Application
    Filed: August 31, 2011
    Publication date: February 28, 2013
    Inventors: Niraj Tolia, Nathan Lorenzo Binkert, Jichuan Chang
  • Publication number: 20130054899
    Abstract: A processor may support a two-dimensional (2-D) gather instruction and a 2-D cache. The processor may perform the 2-D gather instruction to access one or more sub-blocks of data from a two-dimensional (2-D) image stored in a memory coupled to the processor. The two-dimensional (2-D) cache may store the sub-blocks of data in a multiple cache lines. Further, the 2-D cache may support access of more than one cache lines while preserving a two-dimensional structure of the 2-D image.
    Type: Application
    Filed: August 29, 2011
    Publication date: February 28, 2013
    Inventors: Boris Ginzburg, Oleg Margulis