Patents by Inventor Anthony Asaro
Anthony Asaro has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10209991Abstract: A system and method for reducing latencies of main memory data accesses are described. A non-blocking load (NBLD) instruction identifies an address of requested data and a subroutine. The subroutine includes instructions dependent on the requested data. A processing unit verifies that address translations are available for both the address and the subroutine. The processing unit continues processing instructions with no stalls caused by younger-in-program-order instructions waiting for the requested data. The non-blocking load unit performs a cache coherent data read request on behalf of the NBLD instruction and requests that the processing unit perform an asynchronous jump to the subroutine upon return of the requested data from lower-level memory.Type: GrantFiled: November 16, 2016Date of Patent: February 19, 2019Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Meenakshi Sundaram Bhaskaran, Elliot H. Mednick, David A. Roberts, Anthony Asaro, Amin Farmahini-Farahani
-
Publication number: 20190018699Abstract: A technique for recovering from a hang in a virtualized accelerated processing device (“APD”) is provided. In the virtualization scheme, different virtual machines are assigned different “time-slices” in which to use the APD. When a time-slice expires, the APD stops operations for a current VM and starts operations for another VM. To stop operations on the APD, a virtualization scheduler sends a request to idle the APD. The APD responds by completing work and idling. If one or more portions of the APD do not complete this idling process before a timeout expires, then a hang occurs. In response to the hang, the virtualization scheduler informs the hypervisor that a hang has occurred. The hypervisor performs a function level reset on the APD and informs the VM that the hang has occurred. The VM responds by stopping command issue to the APD and re-initializing the APD for the function.Type: ApplicationFiled: July 28, 2017Publication date: January 17, 2019Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Anthony Asaro, Yinan Jiang, Andy Sung, Ahmed M. Abdelkhalek, Xiaowei Wang, Sidney D. Fortes
-
Publication number: 20190004839Abstract: A technique for efficient time-division of resources in a virtualized accelerated processing device (“APD”) is provided. In a virtualization scheme implemented on the APD, different virtual machines are assigned different “time-slices” in which to use the APD. When a time-slice expires, the APD performs a virtualization context switch by stopping operations for a current virtual machine (“VM”) and starting operations for another VM. Typically, each VM is assigned a fixed length of time, after which a virtualization context switch is performed. This fixed length of time can lead to inefficiencies. Therefore, in some situations, in response to a VM having no more work to perform on the APD and the APD being idle, a virtualization context switch is performed “early.” This virtualization context switch is “early” in the sense that the virtualization context switch is performed before the fixed length of time for the time-slice expires.Type: ApplicationFiled: June 29, 2017Publication date: January 3, 2019Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Gongxian Jeffrey Cheng, Louis Regniere, Anthony Asaro
-
Publication number: 20190004840Abstract: A register protection mechanism for a virtualized accelerated processing device (“APD”) is disclosed. The mechanism protects registers of the accelerated processing device designated as physical-function-or-virtual-function registers (“PF-or-VF* registers”), which are single architectural instance registers that are shared among different functions that share the APD in a virtualization scheme whereby each function can maintain a different value in these registers. The protection mechanism for these registers comprises comparing the function associated with the memory address specified by a particular register access request to the “currently active” function for the APD and disallowing the register access request if a match does not occur.Type: ApplicationFiled: June 29, 2017Publication date: January 3, 2019Applicant: ATI Technologies ULCInventors: Anthony Asaro, Yinan Jiang, Kelly Donald Clark Zytaruk
-
Patent number: 10162765Abstract: A device may receive a direct memory access request that identifies a virtual address. The device may determine whether the virtual address is within a particular range of virtual addresses. The device may selectively perform a first action or a second action based on determining whether the virtual address is within the particular range of virtual addresses. The first action may include causing a first address translation algorithm to be performed to translate the virtual address to a physical address associated with a memory device when the virtual address is not within the particular range of virtual addresses. The second action may include causing a second address translation algorithm to be performed to translate the virtual address to the physical address when the virtual address is within the particular range of virtual addresses. The second address translation algorithm may be different from the first address translation algorithm.Type: GrantFiled: April 19, 2017Date of Patent: December 25, 2018Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Andrew G. Kegel, Anthony Asaro
-
Patent number: 10152434Abstract: A system and method for efficient arbitration of memory access requests are described. One or more functional units generate memory access requests for a partitioned memory. An arbitration unit stores the generated requests and selects a given one of the stored requests. The arbitration unit identifies a given partition of the memory which stores a memory location targeted by the selected request. The arbitration unit determines whether one or more other stored requests access memory locations in the given partition. The arbitration unit sends each of the selected memory access request and the identified one or more other memory access requests to the memory to be serviced out of order.Type: GrantFiled: December 20, 2016Date of Patent: December 11, 2018Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Rostyslav Kyrychynskyi, Anthony Asaro, Kostantinos Danny Christidis, Mark Fowler, Michael J. Mantor, Robert Scott Hartog
-
Publication number: 20180349165Abstract: A technique for facilitating direct doorbell rings in a virtualized system is provided. A first device is configured to “ring” a “doorbell” of a second device, where both the first and second devices are not a host processor such as a central processing unit and are coupled to an interconnect fabric such as peripheral component interconnect express (“PCIe”). The first device is configured to ring the doorbell of the second device by writing to a doorbell address in a guest physical address space. For security reasons, a check block checks an offset portion of the doorbell address against a set of allowed doorbell addresses for doorbells specified in the guest physical address space, allowing the doorbell to be written if the doorbell is included in the set of allowed doorbell addresses.Type: ApplicationFiled: May 31, 2017Publication date: December 6, 2018Applicant: ATI Technologies ULCInventors: Anthony Asaro, Gongxian Jeffrey Cheng
-
Publication number: 20180307414Abstract: Systems, apparatuses, and methods for migrating memory pages are disclosed herein. In response to detecting that a migration of a first page between memory locations is being initiated, a first page table entry (PTE) corresponding to the first page is located and a migration pending indication is stored in the first PTE. In one embodiment, the migration pending indication is encoded in the first PTE by disabling read and write permissions. If a translation request targeting the first PTE is received by the MMU and the translation request corresponds to a read request, a read operation is allowed to the first page. Otherwise, if the translation request corresponds to a write request, a write operation to the first page is blocked and a silent retry request is generated and conveyed to the requesting client.Type: ApplicationFiled: April 24, 2017Publication date: October 25, 2018Inventors: Wade K. Smith, Anthony Asaro
-
Publication number: 20180307622Abstract: Systems, apparatuses, and methods for implementing a virtualized translation lookaside buffer (TLB) are disclosed herein. In one embodiment, a system includes at least an execution unit and a first TLB. The system supports the execution of a plurality of virtual machines in a virtualization environment. The system detects a translation request generated by a first virtual machine with a first virtual memory identifier (VMID). The translation request is conveyed from the execution unit to the first TLB. The first TLB performs a lookup of its cache using at least a portion of a first virtual address and the first VMID. If the lookup misses in the cache, the first TLB allocates an entry which is addressable by the first virtual address and the first VMID, and the first TLB sends the translation request with the first VMID to a second TLB.Type: ApplicationFiled: April 24, 2017Publication date: October 25, 2018Inventors: Wade K. Smith, Anthony Asaro
-
Publication number: 20180307619Abstract: A system including a gasket communicatively coupled between a unified northbridge (UNB) having a cache coherent interconnect (CCI) interface and a processor having an Advanced eXtensible Interface (AXI) coherency extension (ACE). The gasket is configured to translate requests from the processor that include ACE commands into equivalent CCI commands, wherein each request from the processor maps onto a specific CCI request type. The gasket is further configured to translate ACE tags into CCI tags. The gasket is further configured to translate CCI encoded probes from a system resource interface (SRI) into equivalent ACE snoop transactions. The gasket is further configured to translate the memory map to inter-operate with a UNB/coherent HyperTransport (cHT) environment. The gasket is further configured to receive a barrier transaction that is used to provide ordering for transactions.Type: ApplicationFiled: July 2, 2018Publication date: October 25, 2018Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Vydhyanathan Kalyanasundharam, Philip Ng, Maggie Chan, Vincent Cueva, Anthony Asaro, Jimshed Mirza, Greggory D. Donley, Bryan Broussard, Benjamin Tsien, Yaniv Adiri
-
Publication number: 20180300253Abstract: Systems, apparatuses, and methods for implementing a translate further mechanism are disclosed herein. In one embodiment, a processor detects a hit to a first entry of a page table structure during a first lookup to the page table structure. The processor retrieves a page table entry address from the first entry and uses this address to perform a second lookup to the page table structure responsive to detecting a first indication in the first entry. The processor retrieves a physical address from the first entry and uses the physical address to access the memory subsystem responsive to not detecting the first indication in the first entry. In one embodiment, the first indication is a translate further bit being set. In another embodiment, the first indication is a page directory entry as page table entry field not being activated.Type: ApplicationFiled: April 13, 2017Publication date: October 18, 2018Inventors: Wade K. Smith, Anthony Asaro, Dhirendra Partap Singh Rana
-
Patent number: 10025721Abstract: The present invention provides for page table access and dirty bit management in hardware via a new atomic test[0] and OR and Mask. The present invention also provides for a gasket that enables ACE to CCI translations. This gasket further provides request translation between ACE and CCI, deadlock avoidance for victim and probe collision, ARM barrier handling, and power management interactions. The present invention also provides a solution for ARM victim/probe collision handling which deadlocks the unified northbridge. These solutions includes a dedicated writeback virtual channel, probes for IO requests using 4-hop protocol, and a WrBack Reorder Ability in MCT where victims update older requests with data as they pass the requests.Type: GrantFiled: October 24, 2014Date of Patent: July 17, 2018Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Vydhyanathan Kalyanasundharam, Philip Ng, Maggie Chan, Vincent Cueva, Anthony Asaro, Jimshed Mirza, Greggory D. Donley, Bryan Broussard, Benjamin Tsien, Yaniv Adiri
-
Publication number: 20180181488Abstract: Techniques for performing cache invalidates and write-backs in an accelerated processing device (e.g., a graphics processing device that renders three-dimensional graphics) are disclosed. The techniques involve receiving requests from a “master” (e.g., the central processing unit). The techniques involve invalidating virtual-to-physical address translations in an address translation request. The techniques include splitting up the requests based on whether the requests target virtually or physically tagged caches. Addresses for the portions of a request that target physically tagged caches are translated using invalidated virtual-to-physical address translations for speed. The split up request is processed to generate micro-transactions for individual caches targeted by the request. Micro-transactions for physically and virtually tagged caches are processed in parallel. Once all micro-transactions for a request have been processed, the unit that made the request is notified.Type: ApplicationFiled: December 23, 2016Publication date: June 28, 2018Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Mark Fowler, Jimshed Mirza, Anthony Asaro
-
Publication number: 20180173649Abstract: A system and method for efficient arbitration of memory access requests are described. One or more functional units generate memory access requests for a partitioned memory. An arbitration unit stores the generated requests and selects a given one of the stored requests. The arbitration unit identifies a given partition of the memory which stores a memory location targeted by the selected request. The arbitration unit determines whether one or more other stored requests access memory locations in the given partition. The arbitration unit sends each of the selected memory access request and the identified one or more other memory access requests to the memory to be serviced out of order.Type: ApplicationFiled: December 20, 2016Publication date: June 21, 2018Inventors: Rostyslav Kyrychynskyi, Anthony Asaro, Kostantinos Danny Christidis, Mark Fowler, Michael J. Mantor, Robert Scott Hartog
-
Patent number: 9965392Abstract: Existing multiprocessor computing systems often have insufficient memory coherency and, consequently, are unable to efficiently utilize separate memory systems. Specifically, a CPU cannot effectively write to a block of memory and then have a GPU access that memory unless there is explicit synchronization. In addition, because the GPU is forced to statically split memory locations between itself and the CPU, existing multiprocessor computing systems are unable to efficiently utilize the separate memory systems. Embodiments described herein overcome these deficiencies by receiving a notification within the GPU that the CPU has finished processing data that is stored in coherent memory, and invalidating data in the CPU caches that the GPU has finished processing from the coherent memory. Embodiments described herein also include dynamically partitioning a GPU memory into coherent memory and local memory through use of a probe filter.Type: GrantFiled: August 24, 2016Date of Patent: May 8, 2018Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Anthony Asaro, Kevin Normoyle, Mark Hummel
-
Patent number: 9959593Abstract: An apparatus includes a unified system/graphics memory and a memory controller. The memory controller is operative to receive client data access requests associated with one or more clients and a central processing unit (CPU) data access request associated with a CPU, to a plurality of memory channels for accessing the unified system/graphics memory. The memory controller is operative to provide access to the plurality of memory channels, in parallel, by the CPU and at least one client of the one or more clients. The memory controller is operative to prioritize the CPU data access request to the unified memory over the client data access requests to the unified memory and control the plurality of memory channels to access, in parallel, data for the CPU and data for the at least one client based on a request of the client data access requests and the CPU data access request.Type: GrantFiled: June 30, 2017Date of Patent: May 1, 2018Assignee: ATI Technologies ULCInventors: Milivoje Aleksic, Raymond M. Li, Danny H. M. Cheng, Carl K. Mizuyabu, Anthony Asaro
-
Patent number: 9910788Abstract: A processor device includes a cache and a memory storing a set of counters. Each counter of the set is associated with a corresponding block of a plurality of blocks of the cache. The processor device further includes a cache access monitor to, for each time quantum for a series of one or more time quanta, increment counter values of the set of counters based on accesses to the corresponding blocks of the cache. The processor device further includes a transfer engine to, after completion of each time quantum, transfer the counter values of the set of counters for the time quantum to a corresponding location in a system memory.Type: GrantFiled: September 22, 2015Date of Patent: March 6, 2018Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Philip J. Rogers, Benjamin T. Sander, Anthony Asaro
-
Publication number: 20180018221Abstract: In one form, a memory controller includes a command queue, an arbiter, and a replay queue. The command queue receives and stores memory access requests. The arbiter is coupled to the command queue for providing a sequence of memory commands to a memory channel. The replay queue stores the sequence of memory commands to the memory channel, and continues to store memory access commands that have not yet received responses from the memory channel. When a response indicates a completion of a corresponding memory command without any error, the replay queue removes the corresponding memory command without taking further action. When a response indicates a completion of the corresponding memory command with an error, the replay queue replays at least the corresponding memory command. In another form, a data processing system includes the memory controller, a memory accessing agent, and a memory system to which the memory controller is coupled.Type: ApplicationFiled: December 9, 2016Publication date: January 18, 2018Applicant: Advanced Micro Devices, Inc.Inventors: James R. Magro, Ruihua Peng, Anthony Asaro, Kedarnath Balakrishnan, Scott P. Murphy, YuBin Yao
-
Publication number: 20180011798Abstract: A method and system for allocating memory to a memory operation executed by a processor in a computer arrangement having a first processor configured for unified operation with a second processor. The method includes receiving a memory operation from a processor and mapping the memory operation to one of a plurality of memory heaps. The mapping produces a mapping result. The method also includes providing the mapping result to the processor.Type: ApplicationFiled: September 5, 2017Publication date: January 11, 2018Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Anthony ASARO, Kevin NORMOYLE, Mark HUMMEL
-
Publication number: 20170301058Abstract: An apparatus includes a unified system/graphics memory and a memory controller. The memory controller is operative to receive client data access requests associated with one or more clients and a central processing unit (CPU) data access request associated with a CPU, to a plurality of memory channels for accessing the unified system/graphics memory. The memory controller is operative to provide access to the plurality of memory channels, in parallel, by the CPU and at least one client of the one or more clients. The memory controller is operative to prioritize the CPU data access request to the unified memory over the client data access requests to the unified memory and control the plurality of memory channels to access, in parallel, data for the CPU and data for the at least one client based on a request of the client data access requests and the CPU data access request.Type: ApplicationFiled: June 30, 2017Publication date: October 19, 2017Inventors: Milivoje Aleksic, Raymond M. Li, Danny H.M. Cheng, Carl K. Mizuyabu, Anthony Asaro