Abstract: A method and apparatus for controlling re-acquiring lines of memory in a cache is provided. The method comprises storing at least one atomic instruction in a queue in response to the atomic instruction being retired, and identifying a target memory location associated with load and store portions of the atomic instruction. A line of memory associated with the target memory location is acquired and stored in a cache. Subsequently, if the line of acquired memory is evicted, then it is re-acquired in response to the atomic instruction becoming the oldest instruction stored in the queue. The apparatus comprises a queue and a cache. The queue is adapted for storing at least one atomic instruction in response to the atomic instruction being retired. A target memory location is associated with load and store portions of the atomic instruction.
Abstract: At an instruction pipeline of a data processor, pipeline resource conflicts are detected by setting, for each executing instruction, one or more assignment indicators to indicate which pipeline resources are to be utilized for executing the instruction. The instruction pipeline detects a pipeline resource conflict if an instruction is assigned a pipeline resource for which the assignment indicator is set. In addition, for selected pipeline resources, such as registers in a register file, the instruction pipeline can detect a pipeline resource conflict if more than one instruction attempts to access the pipeline resource when the assignment indicator for the resource is set. In response to detecting a pipeline resource conflict, the instruction pipeline is flushed and returned to a checkpointed state, thereby protecting the instruction pipeline from architectural state errors.
Abstract: A method for forming an integrated circuit system includes providing an integrated circuit device; and forming an integrated contact over the integrated circuit device including: providing a via over the integrated circuit device; forming a selective metal in the via; forming at least one nanotube over the selective metal; and forming a cap over the nanotubes.
Abstract: The present invention provides a method and apparatus for supporting embodiments of an out-of-order load/store queue structure. One embodiment of the apparatus includes a first queue for storing memory operations adapted to be executed out-of-order with respect to other memory operations. The apparatus also includes one or more additional queues for storing memory operation in response to completion of a memory operation. The embodiment of the apparatus is configured to remove the memory operation from the first queue in response to the completion.
Abstract: A sense-amplifier monotizer includes an amplifier circuit and a keeper circuit. The amplifier circuit outputs a predetermined logic state while a clock signal is in a first phase, and samples a data signal and outputs at least one of the data signal and a complementary logic state of the data signal while the clock signal is in a second phase. A subsequent change of the data signal does not affect an output of the amplifier circuit once the data signal is sampled while the clock signal is in the second phase. The keeper circuit keeps a logic state of the sampled data signal once the data signal is sampled while the clock signal is in the second phase. The amplifier circuit may receive multiple data signals, and output a data signal selected by the select signal and/or a complementary value while the clock signal is in the second phase.
Type:
Grant
Filed:
December 21, 2010
Date of Patent:
April 29, 2014
Assignee:
Advanced Micro Devices, Inc.
Inventors:
Samuel D. Naffziger, Visvesh S. Sathe, Srikanth Arekapudi
Abstract: A processor stores branch information at a “sparse” cache and a “dense” cache. The sparse cache stores the target addresses for up to a specified number of branch instructions in a given cache entry associated with a cache line address, while branch information for additional branch instructions at the cache entry is stored at the dense cache. Branch information at the dense cache persists after eviction of the corresponding cache line until it is replaced by branch information for a different cache entry. Accordingly, in response to the instructions for a given cache line address being requested for retrieval from memory, a prefetcher determines whether the dense cache stores branch information for the cache line address. If so, the prefetcher prefetches the instructions identified by the target addresses of the branch information in the dense cache concurrently with transferring the instructions associated with the cache line address.
Abstract: A technique for prefetching data into a cache memory system includes prefetching data based on meta information indicative of data access patterns. A method includes tagging data of a program with meta information indicative of data access patterns. The method includes prefetching the data from main memory at least partially based on the meta information, by a processor executing the program. In at least one embodiment, the method includes generating an executable at least partially based on the meta information. The executable includes at least one instruction to prefetch the data. In at least one embodiment, the method includes inserting one or more instructions for prefetching the data into an intermediate form of program code while translating program source code into the intermediate form of program code.
Abstract: Methods, circuits and systems are provided to test data paths that traverse multiple clock domains using a common capture clock that is applied to multiple domains. Test data is launched to a first clock domain, and each of the clock domains is selected to receive the common capture clock signal while the test data propagates through the selected clock domain. The test data is capture after it has propagated through each of the multiple domains in response to the shared domains. Applying a common capture clock to each of the different domains eliminates hold time errors that might otherwise occur as the data transitions from one clock domain to another.
Type:
Grant
Filed:
October 20, 2010
Date of Patent:
April 22, 2014
Assignee:
Advanced Micro Devices, Inc.
Inventors:
Ari Shtulman, Karen Tucker, Ahmet Tokuz
Abstract: A method of manufacturing is provided that includes fabricating a first plurality of electrically functional interconnects on a front side of a first semiconductor chip and fabricating a first plurality of electrically non-functional interconnects on a back side of the first semiconductor chip. Additional chips may be stacked on the first semiconductor chip.
Type:
Grant
Filed:
March 30, 2012
Date of Patent:
April 22, 2014
Assignee:
Advanced Micro Devices, Inc.
Inventors:
Michael Su, Bryan Black, Neil McLellan, Joe Siegel, Michael Alfano
Abstract: A method of operating a processor includes reclaiming a physical register renamed as a microcode architectural register used by a microcode routine. The physical register is reclaimed according to an indicator corresponding to the microcode architectural register and indicating that a pointer to the physical register and corresponding to the microcode architectural register is an active pointer.
Abstract: A processor is configured to support a plurality of performance states and idle states. The processor includes a first programmable location associated with a first idle state and configured to store first entry performance state (P-State) information. The first entry P-State information identifies a first entry P-State. The processor is configured to receive a request to enter the first idle state, retrieve the first entry P-State information and enter the first entry P-State. The processor may include a second programmable location associated with the first idle state and configured to store first exit P-State information. The first exit P-State information identifies a first exit P-State. The processor may be configured to receive a request to exit the first idle state, retrieve the first exit P-State information and enter the first exit P-State.
Type:
Grant
Filed:
December 21, 2010
Date of Patent:
April 22, 2014
Assignee:
Advanced Micro Devices, Inc.
Inventors:
Kiran Bondalapati, Magiting M. Talisayon
Abstract: In an embodiment, a device interrupt manager may be configured to receive an interrupt from a device that is assigned to a guest. The device interrupt manager may be configured to transmit an operation targeted to a memory location in a system memory to record the interrupt for a virtual processor within the guest, wherein the interrupt is to be delivered to the targeted virtual processor. In an embodiment, a virtual machine manager may be configured to detect that an interrupt has been recorded by the device interrupt manager for a virtual processor that is not currently executing. The virtual machine manager may be configured to schedule the virtual processor for execution on a hardware processor, or may prioritize the virtual processor for scheduling, in response to the interrupt.
Type:
Grant
Filed:
June 13, 2013
Date of Patent:
April 22, 2014
Assignee:
Advanced Micro Devices, Inc.
Inventors:
Benjamin C. Serebrin, Rodney W. Schmidt, David A. Kaplan, Mark D. Hummel
Abstract: A system and method embodiments for optimally allocating compute kernels to different types of processors, such as CPUs and GPUs, in a heterogeneous computer system are disclosed. These include comparing a kernel profile of a compute kernel to respective processor profiles of a plurality of processors in a heterogeneous computer system, selecting at least one processor from the plurality of processors based upon the comparing, and scheduling the compute kernel for execution in the selected at least one processor.
Abstract: A processing system monitors memory bandwidth available to transfer data from memory to a cache. In addition, the processing system monitors a prefetching accuracy for prefetched data. If the amount of available memory bandwidth is low and the prefetching accuracy is also low, prefetching can be throttled by reducing the amount of data prefetched. The prefetching can be throttled by changing the frequency of prefetching, prefetching depth, prefetching confidence levels, and the like.
Type:
Application
Filed:
October 17, 2012
Publication date:
April 17, 2014
Applicant:
Advanced Micro Devices, Inc.
Inventors:
Todd Rafacz, Marius Evers, Chitresh Narasimhaiah
Abstract: An apparatus and methods for hardware-based performance monitoring of a computer system are presented. The apparatus includes: processing units; a memory; a connector device connecting the processing units and the memory; probes inserted the processing units, and the probes generating probe signals when selected processing events are detected; and a thread trace device connected to the connector device. The thread trace device includes an event interface to receive probe signals, and an event memory controller to send probe event messages to the memory, where probe event messages are based on probe signals. The probe event messages transferred to memory can be subsequently analyzed using a software program to determine, for example, thread-to-thread interactions.
Abstract: An integrated circuit includes a memory having an address space and a memory controller coupled to the memory for accessing the address space in response to received memory accesses. The memory controller further accesses a plurality of data elements in a first portion of the address space, and reliability data corresponding to the plurality of data elements in a second portion of the address space.
Abstract: A processor includes a store queue that stores information representing store instructions. In response to retirement of a store instruction, the processor invalidates the corresponding entry in the store queue, thereby indicating that the entry is available to store a subsequent store instruction. The store address is not removed from the queue until the subsequent store instruction is stored. Accordingly, the store address is available for comparison to a dependent load address.
Type:
Application
Filed:
October 17, 2012
Publication date:
April 17, 2014
Applicant:
Advanced Micro Devices, Inc.
Inventors:
Matthew A. Rafacz, Matthew M. Crum, Michael E. Tuuk
Abstract: A device may receive information that identifies a first task to be processed, may determine a performance metric value indicative of a behavior of a processor while processing a second task, and may assign, based on the performance metric value, the first task to a bin for processing the first task, the bin including a set of processors that operate based on a power characteristic.
Abstract: A method and apparatus for controlling the aggressiveness of a prefetcher based on thrash events is presented. An aggressiveness of a prefetcher for a cache is controlled based upon a number of thrashed cache lines that are replaced by a prefetched cache line and subsequently written back into the cache before the prefetched cache line has been accessed.
Abstract: In response to determining an operation is a dependent operation, a mapper of a processor determines the source registers of the operation from which the dependent operation depends. The mapper translates the dependent operation to a new operation that uses as its source operands at least one of the determined source registers and a source register of the dependent operation. The new operation is independent of other pending operations and therefore can be executed without waiting for execution of other operations, thus reducing execution latency.