Patents by Inventor Nuwan S. Jayasena

Nuwan S. Jayasena has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20140181421
    Abstract: A system includes an atomic processing engine (APE) coupled to an interconnect. The interconnect is to couple to one or more processor cores. The APE receives a plurality of commands from the one or more processor cores through the interconnect. In response to a first command, the APE performs a first plurality of operations associated with the first command. The first plurality of operations references multiple memory locations, at least one of which is shared between two or more threads executed by the one or more processor cores.
    Type: Application
    Filed: December 21, 2012
    Publication date: June 26, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: James M. O'CONNOR, Michael J. Schulte, Nuwan S. Jayasena, Gabriel H. Loh
  • Publication number: 20140181457
    Abstract: A system, method, and memory device embodying some aspects of the present invention for remapping external memory addresses and internal memory locations in stacked memory are provided. The stacked memory includes one or more memory layers configured to store data. The stacked memory also includes a logic layer connected to the memory layer. The logic layer has an Input/Output (I/O) port configured to receive read and write commands from external devices, a memory map configured to maintain an association between external memory addresses and internal memory locations, and a controller coupled to the I/O port, memory map, and memory layers, configured to store data received from external devices to internal memory locations.
    Type: Application
    Filed: December 21, 2012
    Publication date: June 26, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Lisa R. HSU, Gabriel H. LOH, Michael IGNATOWSKI, Michael J. SCHULTE, Nuwan S. JAYASENA, James M. O'CONNOR
  • Publication number: 20140181414
    Abstract: A system and method for efficiently limiting storage space for data with particular properties in a cache memory. A computing system includes a cache array and a corresponding cache controller. The cache array includes multiple banks, wherein a first bank is powered down. In response a write request to a second bank for data indicated to be stored in the powered down first bank, the cache controller determines a respective bypass condition for the data. If the bypass condition exceeds a threshold, then the cache controller invalidates any copy of the data stored in the second bank. If the bypass condition does not exceed the threshold, then the cache controller stores the data with a clean state in the second bank. The cache controller writes the data in a lower-level memory for both cases.
    Type: Application
    Filed: October 16, 2013
    Publication date: June 26, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Yasuko Eckert, Gabriel H. Loh, Mauricio Breternitz, James M. O'Connor, Srilatha Manne, Nuwan S. Jayasena, Mithuna S. Thottethodi
  • Publication number: 20140176187
    Abstract: A die-stacked memory device incorporates a reconfigurable logic device to provide implementation flexibility in performing various data manipulation operations and other memory operations that use data stored in the die-stacked memory device or that result in data that is to be stored in the die-stacked memory device. One or more configuration files representing corresponding logic configurations for the reconfigurable logic device can be stored in a configuration store at the die-stacked memory device, and a configuration controller can program a reconfigurable logic fabric of the reconfigurable logic device using a selected one of the configuration files. Due to the integration of the logic dies and the memory dies, the reconfigurable logic device can perform various data manipulation operations with higher bandwidth and lower latency and power consumption compared to devices external to the die-stacked memory device.
    Type: Application
    Filed: December 23, 2012
    Publication date: June 26, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Nuwan S. Jayasena, Michael J. Schulte, Gabriel H. Loh, Michael Ignatowski
  • Publication number: 20140181427
    Abstract: Some die-stacked memories will contain a logic layer in addition to one or more layers of DRAM (or other memory technology). This logic layer may be a discrete logic die or logic on a silicon interposer associated with a stack of memory dies. Additional circuitry/functionality is placed on the logic layer to implement functionality to perform various data movement and address calculation operations. This functionality would allow compound memory operations—a single request communicated to the memory that characterizes the accesses and movement of many data items. This eliminates the performance and power overheads associated with communicating address and control information on a fine-grain, per-data-item basis from a host processor (or other device) to the memory. This approach also provides better visibility of macro-level memory access patterns to the memory system and may enable additional optimizations in scheduling memory accesses.
    Type: Application
    Filed: December 21, 2012
    Publication date: June 26, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Nuwan S. JAYASENA, James M. O'Connor, Gabriel H. Loh, Michael J. Schulte, Bradford M. Beckmann, Michael Ignatowski
  • Publication number: 20140181458
    Abstract: A die-stacked memory device incorporates a data translation controller at one or more logic dies of the device to provide data translation services for data to be stored at, or retrieved from, the die-stacked memory device. The data translation operations implemented by the data translation controller can include compression/decompression operations, encryption/decryption operations, format translations, wear-leveling translations, data ordering operations, and the like. Due to the tight integration of the logic dies and the memory dies, the data translation controller can perform data translation operations with higher bandwidth and lower latency and power consumption compared to operations performed by devices external to the die-stacked memory device.
    Type: Application
    Filed: December 23, 2012
    Publication date: June 26, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Gabriel H. Loh, Bradford M. Beckmann, James M. O'Connor, Michael Ignatowski, Michael J. Schulte, Lisa R. Hsu, Nuwan S. Jayasena
  • Publication number: 20140173216
    Abstract: Embodiments include methods, systems, and articles of manufacture directed to identifying transient data upon storing the transient data in a cache memory, and invalidating the identified transient data in the cache memory.
    Type: Application
    Filed: December 18, 2012
    Publication date: June 19, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Nuwan S. JAYASENA, Mark D. HILL
  • Publication number: 20140156975
    Abstract: In some embodiments, a method for improving reliability in a processor is provided. The method can include replicating input data for first and second lanes of a processor, the first and second lanes being located in a same cluster of the processor and the first and second lanes each generating a respective value associated with an instruction to be executed in the respective lane, and responsive to a determination that the generated values do not match, providing an indication that the generated values do not match.
    Type: Application
    Filed: November 30, 2012
    Publication date: June 5, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Vilas SRIDHARAN, James M. O'Connor, Steven K. Reinhardt, Nuwan S. Jayasena, Michael J. Schulte, Dean A. Liberty
  • Publication number: 20140149677
    Abstract: Embodiments include methods, systems and computer readable media configured to execute a first kernel (e.g. compute or graphics kernel) with reduced intermediate state storage resource requirements. These include executing a first and second (e.g. prefetch) kernel on a data-parallel processor, such that the second kernel begins executing before the first kernel. The second kernel performs memory operations that are based upon at least a subset of memory operations in the first kernel.
    Type: Application
    Filed: November 26, 2012
    Publication date: May 29, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Nuwan S. JAYASENA, James Michael O'CONNOR, Michael MANTOR
  • Publication number: 20140149772
    Abstract: The described embodiments include a computing device with one or more entities (processor cores, processors, etc.). In some embodiments, during operation, a thermal power management unit in the computing device uses a linear prediction to compute a predicted duration of a next idle period for an entity based on the duration of one or more previous idle periods for the entity. Based on the predicted duration of the next idle period, the thermal power management unit configures the entity to operate in a corresponding idle state.
    Type: Application
    Filed: November 8, 2013
    Publication date: May 29, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Manish Arora, Nuwan S. Jayasena, Yasuko Eckert, Madhu Saravana Sibi Govindan, William L. Bircher, Michael J. Schulte, Srilatha Manne
  • Patent number: 8707314
    Abstract: A system and method embodiments for optimally allocating compute kernels to different types of processors, such as CPUs and GPUs, in a heterogeneous computer system are disclosed. These include comparing a kernel profile of a compute kernel to respective processor profiles of a plurality of processors in a heterogeneous computer system, selecting at least one processor from the plurality of processors based upon the comparing, and scheduling the compute kernel for execution in the selected at least one processor.
    Type: Grant
    Filed: December 16, 2011
    Date of Patent: April 22, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Jayanth Gummaraju, Nuwan S. Jayasena
  • Publication number: 20140101405
    Abstract: Methods and apparatuses are provided for avoiding cold translation lookaside buffer (TLB) misses in a computer system. A typical system is configured as a heterogeneous computing system having at least one central processing unit (CPU) and one or more graphic processing units (GPUs) that share a common memory address space. Each processing unit (CPU and GPU) has an independent TLB. When offloading a task from a particular CPU to a particular GPU, translation information is sent along with the task assignment. The translation information allows the GPU to load the address translation data into the TLB associated with the one or more GPUs prior to executing the task. Preloading the TLB of the GPUs reduces or avoids cold TLB misses that could otherwise occur without the benefits offered by the present disclosure.
    Type: Application
    Filed: October 5, 2012
    Publication date: April 10, 2014
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Misel-Myrto Papadopoulou, Lisa R. Hsu, Andrew G. Kegel, Nuwan S. Jayasena, Bradford M. Beckmann, Steven K. Reinhardt
  • Publication number: 20140040532
    Abstract: A processing system comprises one or more processor devices and other system components coupled to a stacked memory device having a set of stacked memory layers and a set of one or more logic layers. The set of logic layers implements a helper processor that executes instructions to perform tasks in response to a task request from the processor devices or otherwise on behalf of the other processor devices. The set of logic layers also includes a memory interface coupled to memory cell circuitry implemented in the set of stacked memory layers and coupleable to the processor devices. The memory interface operates to perform memory accesses for the processor devices and for the helper processor. By virtue of the helper processor's tight integration with the stacked memory layers, the helper processor may perform certain memory-intensive operations more efficiently than could be performed by the external processor devices.
    Type: Application
    Filed: August 6, 2012
    Publication date: February 6, 2014
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Yasuko Watanabe, Gabriel H. Loh, James M. O'Connor, Michael Ignatowski, Nuwan S. Jayasena
  • Publication number: 20140022263
    Abstract: The desire to use an Accelerated Processing Device (APD) for general computation has increased due to the APD's exemplary performance characteristics. However, current systems incur high overhead when dispatching work to the APD because a process cannot be efficiently identified or preempted. The occupying of the APD by a rogue process for arbitrary amounts of time can prevent the effective utilization of the available system capacity and can reduce the processing progress of the system. Embodiments described herein can overcome this deficiency by enabling the system software to pre-empt a process executing on the APD for any reason. The APD provides an interface for initiating such a pre-emption. This interface exposes an urgency of the request which determines whether the process being preempted is allowed a grace period to complete its issued work before being forced off the hardware.
    Type: Application
    Filed: July 23, 2012
    Publication date: January 23, 2014
    Inventors: Robert Scott Hartog, Ralph Clay Taylor, Michael Mantor, Kevin McGrath, Sebastien Nussbaum, Nuwan S. Jayasena, Rex Eldon McCrary, Mark Leather, Philip J. Rogers
  • Publication number: 20130159812
    Abstract: According to one embodiment, a memory architecture implemented method is provided, where the memory architecture includes a logic chip and one or more memory chips on a single die, and where the method comprises: reading values of data from the one or more memory chips to the logic chip, where the one or more memory chips and the logic chip are on a single die; modifying, via the logic chip on the single die, the values of data; and writing, from the logic chip to the one or more memory chips, the modified values of data.
    Type: Application
    Filed: December 16, 2011
    Publication date: June 20, 2013
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Gabriel H. LOH, James M. O'Connor, Michael Ignatowski, Nuwan S. Jayasena, Bradford M. Beckmann
  • Publication number: 20130160016
    Abstract: A system and method embodiments for optimally allocating compute kernels to different types of processors, such as CPUs and GPUs, in a heterogeneous computer system are disclosed. These include comparing a kernel profile of a compute kernel to respective processor profiles of a plurality of processors in a heterogeneous computer system, selecting at least one processor from the plurality of processors based upon the comparing, and scheduling the compute kernel for execution in the selected at least one processor.
    Type: Application
    Filed: December 16, 2011
    Publication date: June 20, 2013
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Jayanth GUMMARAJU, Nuwan S. JAYASENA
  • Publication number: 20130160017
    Abstract: Embodiments describe herein provide a method of for managing task scheduling on a accelerated processing device. The method includes executing a first task within the accelerated processing device (APD), monitoring for an interruption of the execution of the first task, and switching to a second task when an interruption is detected.
    Type: Application
    Filed: December 14, 2011
    Publication date: June 20, 2013
    Inventors: Robert Scott HARTOG, Ralph Clay Taylor, Michael Mantor, Thomas Roy Woller, Kevin McGrath, Sebastien Nussbaum, Nuwan S. Jayasena, Rex McCrary, Philip J. Rogers, Mark Leather
  • Publication number: 20120198458
    Abstract: Embodiments of the present invention provide a method of synchronous operation of a first processing device and a second processing device. The method includes executing a process on the first processing device, responsive to a determination that execution of the process on the first device has reached a serial-parallel boundary, passing an execution thread of the process from the first processing device to the second processing device, and executing the process on the second processing device.
    Type: Application
    Filed: November 30, 2011
    Publication date: August 2, 2012
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Robert Scott Hartog, Ralph Clay Taylor, Michael Mantor, Sebastien Nussbaum, Rex McCrary, Mark Leather, Nuwan S. Jayasena, Kevin McGrath, Philip j. Rogers, Thomas Woller