Patents by Inventor Anthony Asaro

Anthony Asaro has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10209991
    Abstract: A system and method for reducing latencies of main memory data accesses are described. A non-blocking load (NBLD) instruction identifies an address of requested data and a subroutine. The subroutine includes instructions dependent on the requested data. A processing unit verifies that address translations are available for both the address and the subroutine. The processing unit continues processing instructions with no stalls caused by younger-in-program-order instructions waiting for the requested data. The non-blocking load unit performs a cache coherent data read request on behalf of the NBLD instruction and requests that the processing unit perform an asynchronous jump to the subroutine upon return of the requested data from lower-level memory.
    Type: Grant
    Filed: November 16, 2016
    Date of Patent: February 19, 2019
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Meenakshi Sundaram Bhaskaran, Elliot H. Mednick, David A. Roberts, Anthony Asaro, Amin Farmahini-Farahani
  • Publication number: 20190018699
    Abstract: A technique for recovering from a hang in a virtualized accelerated processing device (“APD”) is provided. In the virtualization scheme, different virtual machines are assigned different “time-slices” in which to use the APD. When a time-slice expires, the APD stops operations for a current VM and starts operations for another VM. To stop operations on the APD, a virtualization scheduler sends a request to idle the APD. The APD responds by completing work and idling. If one or more portions of the APD do not complete this idling process before a timeout expires, then a hang occurs. In response to the hang, the virtualization scheduler informs the hypervisor that a hang has occurred. The hypervisor performs a function level reset on the APD and informs the VM that the hang has occurred. The VM responds by stopping command issue to the APD and re-initializing the APD for the function.
    Type: Application
    Filed: July 28, 2017
    Publication date: January 17, 2019
    Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Anthony Asaro, Yinan Jiang, Andy Sung, Ahmed M. Abdelkhalek, Xiaowei Wang, Sidney D. Fortes
  • Publication number: 20190004839
    Abstract: A technique for efficient time-division of resources in a virtualized accelerated processing device (“APD”) is provided. In a virtualization scheme implemented on the APD, different virtual machines are assigned different “time-slices” in which to use the APD. When a time-slice expires, the APD performs a virtualization context switch by stopping operations for a current virtual machine (“VM”) and starting operations for another VM. Typically, each VM is assigned a fixed length of time, after which a virtualization context switch is performed. This fixed length of time can lead to inefficiencies. Therefore, in some situations, in response to a VM having no more work to perform on the APD and the APD being idle, a virtualization context switch is performed “early.” This virtualization context switch is “early” in the sense that the virtualization context switch is performed before the fixed length of time for the time-slice expires.
    Type: Application
    Filed: June 29, 2017
    Publication date: January 3, 2019
    Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Gongxian Jeffrey Cheng, Louis Regniere, Anthony Asaro
  • Publication number: 20190004840
    Abstract: A register protection mechanism for a virtualized accelerated processing device (“APD”) is disclosed. The mechanism protects registers of the accelerated processing device designated as physical-function-or-virtual-function registers (“PF-or-VF* registers”), which are single architectural instance registers that are shared among different functions that share the APD in a virtualization scheme whereby each function can maintain a different value in these registers. The protection mechanism for these registers comprises comparing the function associated with the memory address specified by a particular register access request to the “currently active” function for the APD and disallowing the register access request if a match does not occur.
    Type: Application
    Filed: June 29, 2017
    Publication date: January 3, 2019
    Applicant: ATI Technologies ULC
    Inventors: Anthony Asaro, Yinan Jiang, Kelly Donald Clark Zytaruk
  • Patent number: 10162765
    Abstract: A device may receive a direct memory access request that identifies a virtual address. The device may determine whether the virtual address is within a particular range of virtual addresses. The device may selectively perform a first action or a second action based on determining whether the virtual address is within the particular range of virtual addresses. The first action may include causing a first address translation algorithm to be performed to translate the virtual address to a physical address associated with a memory device when the virtual address is not within the particular range of virtual addresses. The second action may include causing a second address translation algorithm to be performed to translate the virtual address to the physical address when the virtual address is within the particular range of virtual addresses. The second address translation algorithm may be different from the first address translation algorithm.
    Type: Grant
    Filed: April 19, 2017
    Date of Patent: December 25, 2018
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Andrew G. Kegel, Anthony Asaro
  • Patent number: 10152434
    Abstract: A system and method for efficient arbitration of memory access requests are described. One or more functional units generate memory access requests for a partitioned memory. An arbitration unit stores the generated requests and selects a given one of the stored requests. The arbitration unit identifies a given partition of the memory which stores a memory location targeted by the selected request. The arbitration unit determines whether one or more other stored requests access memory locations in the given partition. The arbitration unit sends each of the selected memory access request and the identified one or more other memory access requests to the memory to be serviced out of order.
    Type: Grant
    Filed: December 20, 2016
    Date of Patent: December 11, 2018
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Rostyslav Kyrychynskyi, Anthony Asaro, Kostantinos Danny Christidis, Mark Fowler, Michael J. Mantor, Robert Scott Hartog
  • Publication number: 20180349165
    Abstract: A technique for facilitating direct doorbell rings in a virtualized system is provided. A first device is configured to “ring” a “doorbell” of a second device, where both the first and second devices are not a host processor such as a central processing unit and are coupled to an interconnect fabric such as peripheral component interconnect express (“PCIe”). The first device is configured to ring the doorbell of the second device by writing to a doorbell address in a guest physical address space. For security reasons, a check block checks an offset portion of the doorbell address against a set of allowed doorbell addresses for doorbells specified in the guest physical address space, allowing the doorbell to be written if the doorbell is included in the set of allowed doorbell addresses.
    Type: Application
    Filed: May 31, 2017
    Publication date: December 6, 2018
    Applicant: ATI Technologies ULC
    Inventors: Anthony Asaro, Gongxian Jeffrey Cheng
  • Publication number: 20180307414
    Abstract: Systems, apparatuses, and methods for migrating memory pages are disclosed herein. In response to detecting that a migration of a first page between memory locations is being initiated, a first page table entry (PTE) corresponding to the first page is located and a migration pending indication is stored in the first PTE. In one embodiment, the migration pending indication is encoded in the first PTE by disabling read and write permissions. If a translation request targeting the first PTE is received by the MMU and the translation request corresponds to a read request, a read operation is allowed to the first page. Otherwise, if the translation request corresponds to a write request, a write operation to the first page is blocked and a silent retry request is generated and conveyed to the requesting client.
    Type: Application
    Filed: April 24, 2017
    Publication date: October 25, 2018
    Inventors: Wade K. Smith, Anthony Asaro
  • Publication number: 20180307622
    Abstract: Systems, apparatuses, and methods for implementing a virtualized translation lookaside buffer (TLB) are disclosed herein. In one embodiment, a system includes at least an execution unit and a first TLB. The system supports the execution of a plurality of virtual machines in a virtualization environment. The system detects a translation request generated by a first virtual machine with a first virtual memory identifier (VMID). The translation request is conveyed from the execution unit to the first TLB. The first TLB performs a lookup of its cache using at least a portion of a first virtual address and the first VMID. If the lookup misses in the cache, the first TLB allocates an entry which is addressable by the first virtual address and the first VMID, and the first TLB sends the translation request with the first VMID to a second TLB.
    Type: Application
    Filed: April 24, 2017
    Publication date: October 25, 2018
    Inventors: Wade K. Smith, Anthony Asaro
  • Publication number: 20180307619
    Abstract: A system including a gasket communicatively coupled between a unified northbridge (UNB) having a cache coherent interconnect (CCI) interface and a processor having an Advanced eXtensible Interface (AXI) coherency extension (ACE). The gasket is configured to translate requests from the processor that include ACE commands into equivalent CCI commands, wherein each request from the processor maps onto a specific CCI request type. The gasket is further configured to translate ACE tags into CCI tags. The gasket is further configured to translate CCI encoded probes from a system resource interface (SRI) into equivalent ACE snoop transactions. The gasket is further configured to translate the memory map to inter-operate with a UNB/coherent HyperTransport (cHT) environment. The gasket is further configured to receive a barrier transaction that is used to provide ordering for transactions.
    Type: Application
    Filed: July 2, 2018
    Publication date: October 25, 2018
    Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Vydhyanathan Kalyanasundharam, Philip Ng, Maggie Chan, Vincent Cueva, Anthony Asaro, Jimshed Mirza, Greggory D. Donley, Bryan Broussard, Benjamin Tsien, Yaniv Adiri
  • Publication number: 20180300253
    Abstract: Systems, apparatuses, and methods for implementing a translate further mechanism are disclosed herein. In one embodiment, a processor detects a hit to a first entry of a page table structure during a first lookup to the page table structure. The processor retrieves a page table entry address from the first entry and uses this address to perform a second lookup to the page table structure responsive to detecting a first indication in the first entry. The processor retrieves a physical address from the first entry and uses the physical address to access the memory subsystem responsive to not detecting the first indication in the first entry. In one embodiment, the first indication is a translate further bit being set. In another embodiment, the first indication is a page directory entry as page table entry field not being activated.
    Type: Application
    Filed: April 13, 2017
    Publication date: October 18, 2018
    Inventors: Wade K. Smith, Anthony Asaro, Dhirendra Partap Singh Rana
  • Patent number: 10025721
    Abstract: The present invention provides for page table access and dirty bit management in hardware via a new atomic test[0] and OR and Mask. The present invention also provides for a gasket that enables ACE to CCI translations. This gasket further provides request translation between ACE and CCI, deadlock avoidance for victim and probe collision, ARM barrier handling, and power management interactions. The present invention also provides a solution for ARM victim/probe collision handling which deadlocks the unified northbridge. These solutions includes a dedicated writeback virtual channel, probes for IO requests using 4-hop protocol, and a WrBack Reorder Ability in MCT where victims update older requests with data as they pass the requests.
    Type: Grant
    Filed: October 24, 2014
    Date of Patent: July 17, 2018
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Vydhyanathan Kalyanasundharam, Philip Ng, Maggie Chan, Vincent Cueva, Anthony Asaro, Jimshed Mirza, Greggory D. Donley, Bryan Broussard, Benjamin Tsien, Yaniv Adiri
  • Publication number: 20180181488
    Abstract: Techniques for performing cache invalidates and write-backs in an accelerated processing device (e.g., a graphics processing device that renders three-dimensional graphics) are disclosed. The techniques involve receiving requests from a “master” (e.g., the central processing unit). The techniques involve invalidating virtual-to-physical address translations in an address translation request. The techniques include splitting up the requests based on whether the requests target virtually or physically tagged caches. Addresses for the portions of a request that target physically tagged caches are translated using invalidated virtual-to-physical address translations for speed. The split up request is processed to generate micro-transactions for individual caches targeted by the request. Micro-transactions for physically and virtually tagged caches are processed in parallel. Once all micro-transactions for a request have been processed, the unit that made the request is notified.
    Type: Application
    Filed: December 23, 2016
    Publication date: June 28, 2018
    Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Mark Fowler, Jimshed Mirza, Anthony Asaro
  • Publication number: 20180173649
    Abstract: A system and method for efficient arbitration of memory access requests are described. One or more functional units generate memory access requests for a partitioned memory. An arbitration unit stores the generated requests and selects a given one of the stored requests. The arbitration unit identifies a given partition of the memory which stores a memory location targeted by the selected request. The arbitration unit determines whether one or more other stored requests access memory locations in the given partition. The arbitration unit sends each of the selected memory access request and the identified one or more other memory access requests to the memory to be serviced out of order.
    Type: Application
    Filed: December 20, 2016
    Publication date: June 21, 2018
    Inventors: Rostyslav Kyrychynskyi, Anthony Asaro, Kostantinos Danny Christidis, Mark Fowler, Michael J. Mantor, Robert Scott Hartog
  • Patent number: 9965392
    Abstract: Existing multiprocessor computing systems often have insufficient memory coherency and, consequently, are unable to efficiently utilize separate memory systems. Specifically, a CPU cannot effectively write to a block of memory and then have a GPU access that memory unless there is explicit synchronization. In addition, because the GPU is forced to statically split memory locations between itself and the CPU, existing multiprocessor computing systems are unable to efficiently utilize the separate memory systems. Embodiments described herein overcome these deficiencies by receiving a notification within the GPU that the CPU has finished processing data that is stored in coherent memory, and invalidating data in the CPU caches that the GPU has finished processing from the coherent memory. Embodiments described herein also include dynamically partitioning a GPU memory into coherent memory and local memory through use of a probe filter.
    Type: Grant
    Filed: August 24, 2016
    Date of Patent: May 8, 2018
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Anthony Asaro, Kevin Normoyle, Mark Hummel
  • Patent number: 9959593
    Abstract: An apparatus includes a unified system/graphics memory and a memory controller. The memory controller is operative to receive client data access requests associated with one or more clients and a central processing unit (CPU) data access request associated with a CPU, to a plurality of memory channels for accessing the unified system/graphics memory. The memory controller is operative to provide access to the plurality of memory channels, in parallel, by the CPU and at least one client of the one or more clients. The memory controller is operative to prioritize the CPU data access request to the unified memory over the client data access requests to the unified memory and control the plurality of memory channels to access, in parallel, data for the CPU and data for the at least one client based on a request of the client data access requests and the CPU data access request.
    Type: Grant
    Filed: June 30, 2017
    Date of Patent: May 1, 2018
    Assignee: ATI Technologies ULC
    Inventors: Milivoje Aleksic, Raymond M. Li, Danny H. M. Cheng, Carl K. Mizuyabu, Anthony Asaro
  • Patent number: 9910788
    Abstract: A processor device includes a cache and a memory storing a set of counters. Each counter of the set is associated with a corresponding block of a plurality of blocks of the cache. The processor device further includes a cache access monitor to, for each time quantum for a series of one or more time quanta, increment counter values of the set of counters based on accesses to the corresponding blocks of the cache. The processor device further includes a transfer engine to, after completion of each time quantum, transfer the counter values of the set of counters for the time quantum to a corresponding location in a system memory.
    Type: Grant
    Filed: September 22, 2015
    Date of Patent: March 6, 2018
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Philip J. Rogers, Benjamin T. Sander, Anthony Asaro
  • Publication number: 20180018221
    Abstract: In one form, a memory controller includes a command queue, an arbiter, and a replay queue. The command queue receives and stores memory access requests. The arbiter is coupled to the command queue for providing a sequence of memory commands to a memory channel. The replay queue stores the sequence of memory commands to the memory channel, and continues to store memory access commands that have not yet received responses from the memory channel. When a response indicates a completion of a corresponding memory command without any error, the replay queue removes the corresponding memory command without taking further action. When a response indicates a completion of the corresponding memory command with an error, the replay queue replays at least the corresponding memory command. In another form, a data processing system includes the memory controller, a memory accessing agent, and a memory system to which the memory controller is coupled.
    Type: Application
    Filed: December 9, 2016
    Publication date: January 18, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: James R. Magro, Ruihua Peng, Anthony Asaro, Kedarnath Balakrishnan, Scott P. Murphy, YuBin Yao
  • Publication number: 20180011798
    Abstract: A method and system for allocating memory to a memory operation executed by a processor in a computer arrangement having a first processor configured for unified operation with a second processor. The method includes receiving a memory operation from a processor and mapping the memory operation to one of a plurality of memory heaps. The mapping produces a mapping result. The method also includes providing the mapping result to the processor.
    Type: Application
    Filed: September 5, 2017
    Publication date: January 11, 2018
    Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Anthony ASARO, Kevin NORMOYLE, Mark HUMMEL
  • Publication number: 20170301058
    Abstract: An apparatus includes a unified system/graphics memory and a memory controller. The memory controller is operative to receive client data access requests associated with one or more clients and a central processing unit (CPU) data access request associated with a CPU, to a plurality of memory channels for accessing the unified system/graphics memory. The memory controller is operative to provide access to the plurality of memory channels, in parallel, by the CPU and at least one client of the one or more clients. The memory controller is operative to prioritize the CPU data access request to the unified memory over the client data access requests to the unified memory and control the plurality of memory channels to access, in parallel, data for the CPU and data for the at least one client based on a request of the client data access requests and the CPU data access request.
    Type: Application
    Filed: June 30, 2017
    Publication date: October 19, 2017
    Inventors: Milivoje Aleksic, Raymond M. Li, Danny H.M. Cheng, Carl K. Mizuyabu, Anthony Asaro