Patents by Inventor Jimshed Mirza

Jimshed Mirza has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

LAST USE CACHE POLICY

Publication number: 20240111681

Abstract: A processor for implementing a last use cache policy is configured to access data in a portion of a cache, determine that the data in the portion of the cache is no longer needed, and mark the data in the portion of the cache as non-dirty responsive to the determining that the data in the portion of the cache is no longer needed. The marking of the data as non-dirty is indicative that the data in the portion of the cache is not to be evicted from the cache to a memory.

Type: Application

Filed: September 29, 2022

Publication date: April 4, 2024

Inventor: JIMSHED MIRZA
GRAPHICS DISCARD ENGINE

Publication number: 20230206559

Abstract: Systems, apparatuses, and methods for implementing a discard engine in a graphics pipeline are disclosed. A system includes a graphics pipeline with a geometry engine launching shaders that generate attribute data for vertices of each primitive of a set of primitives. The attribute data is consumed by pixel shaders, with each pixel shader generating a deallocation message when the pixel shader no longer needs the attribute data. A discard engine gathers deallocations from multiple pixel shaders and determines when the attribute data is no longer needed. Once a block of attributes has been consumed by all potential pixel shader consumers, the discard engine deallocates the given block of attributes. The discard engine sends a discard command to the caches so that the attribute data can be invalidated and not written back to memory.

Type: Application

Filed: December 27, 2021

Publication date: June 29, 2023

Inventors: Christopher J. Brennan, Randy Wayne Ramsey, Nishank Pathak, Ricky Wai Yeung Iu, Jimshed Mirza, Anthony Chan
OPTIMIZING PARTIAL WRITES TO COMPRESSED BLOCKS

Publication number: 20230206380

Abstract: A processor for optimizing partial writes to compressed blocks is configured to identify that a write request targets less than an entirety of a compressed block of pixel data, identify, based on a compression key, a compressed segment of the compressed block of pixel data that includes a target of the write request, and decompress, responsive to the write request, only the identified compressed segment of the compressed block of pixel data.

Type: Application

Filed: December 28, 2021

Publication date: June 29, 2023

Inventors: ANTHONY HC CHAN, CHRISTOPHER J. BRENNAN, MARK FOWLER, DAVID CHUI, LEON K.N. LAI, JIMSHED MIRZA
CASCADING EXECUTION OF ATOMIC OPERATIONS

Publication number: 20230205696

Abstract: Cascading execution of atomic operations, including: receiving a request for each thread of a plurality of threads to perform an atomic operation, wherein the plurality of threads comprises a plurality of thread subsets each corresponding to a local memory, wherein the local memory for a thread subset is accessible by the thread subset and inaccessible to a remainder of threads in the plurality of threads; generating a plurality of intermediate results by performing, by each thread subset, the atomic operation in the local memory corresponding to the thread subset; and generating a result for the request by aggregating the plurality of intermediate results in a shared memory accessible to all threads in the plurality of threads.

Type: Application

Filed: December 28, 2021

Publication date: June 29, 2023

Inventors: JIMSHED MIRZA, MARK FOWLER
TRANSLATION LOOKASIDE BUFFER ENTRY ALLOCATION SYSTEM AND METHOD

Publication number: 20230103230

Abstract: A processing system includes a translation lookaside buffer (TLB). The TLB includes a plurality of TLB entries that are configured to store requested page size indications. The TLB is configured to be indexed via the requested page size indications such that a plurality of TLB requests that each indicate a same virtual address, but different respective requested page sizes are allocated respective TLB entries. As a result, in response to a TLB request that indicates a requested page size and has a virtual address that corresponds to multiple TLB entries, only a single TLB entry is identified as a TLB hit.

Type: Application

Filed: September 27, 2021

Publication date: March 30, 2023

Inventors: Edwin Pang, Jimshed Mirza
Method and system for partial wavefront merger

Patent number: 10877926

Abstract: A method and system for partial wavefront merger is described. Vector processing machines employ the partial wavefront merger to merge partial wavefronts into one or more wavefronts. The system includes a partial wavefront manager and unified registers. The partial wavefront manager detects wavefronts in different single-instruction-multiple-data (“SIMD”) units which contain inactive work items and active work items (hereinafter referred to as “partial wavefronts”), moves the partial wavefronts into one or more SIMD unit(s) and merges the partial wavefronts into one or more wavefront(s). The unified register allows each active work item in the one or more merged wavefront(s) to access the previously allocated registers in the originating SIMD units. Consequently, the contents of the unified registers do not have to be copied to the SIMD unit(s) executing the one or merged wavefront(s).

Type: Grant

Filed: July 23, 2018

Date of Patent: December 29, 2020

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Yunpeng Zhu, Jimshed Mirza
Data driven scheduler on multiple computing cores

Patent number: 10649810

Abstract: Methods, devices, and systems for data driven scheduling of a plurality of computing cores of a processor. A plurality of threads may be executed on the plurality of computing cores, according to a default schedule. The plurality of threads may be analyzed, based on the execution, to determine correlations among the plurality of threads. A data driven schedule may be generated based on the correlations. The plurality of threads may be executed on the plurality of computing cores according to the data driven schedule.

Type: Grant

Filed: December 28, 2015

Date of Patent: May 12, 2020

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Jimshed Mirza, YunPeng Zhu
Flexible shader export design in multiple computing cores

Patent number: 10606740

Abstract: Systems, apparatuses, and methods for generating flexibly addressed memory requests are disclosed. In one embodiment, a system includes a processor, control unit, and memory subsystem. The processor launches a plurality of threads on a plurality of compute units, wherein each thread generates memory requests without specifying target memory addresses. The threads executing on the plurality of compute units convey a plurality of memory requests to the control unit. The control unit generates target memory addresses for the plurality of received memory requests. In one embodiment, the memory requests are write requests, and the control unit interleaves write requests from the plurality of threads into a single output buffer stored in the memory subsystem. The control unit can be located in a cache, in a memory controller, or in another location within the system.

Type: Grant

Filed: May 26, 2017

Date of Patent: March 31, 2020

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Yunpeng Zhu, Jimshed Mirza
Hardware structure to track page reuse

Patent number: 10580110

Abstract: Systems, apparatuses, and methods for tracking page reuse and migrating pages are disclosed. In one embodiment, a system includes one or more processors, a memory access monitor, and multiple memory regions. The memory access monitor tracks accesses to memory pages in a system memory during a programmable interval. If the number of accesses to a given page is greater than a programmable threshold during the programmable interval, then the memory access monitor generates an interrupt for software to migrate the given page from the system memory to a local memory. If the number of accesses to the given page is less than or equal to the programmable threshold during the programmable interval, then the given page remains in the system memory. After the programmable interval, the memory access monitor starts tracking the number of accesses to a new page in a subsequent interval.

Type: Grant

Filed: April 25, 2017

Date of Patent: March 3, 2020

Assignee: ATI Technologies ULC

Inventors: Jimshed Mirza, Al Hasanur Rahman, Sergey Korobkov, Houman Namiranian
Multiple linked list data structure

Patent number: 10545887

Abstract: A system and method for maintaining information of pending operations are described. A buffer uses multiple linked lists implementing a single logical queue for a single requestor. The buffer maintains multiple head pointers and multiple tail pointers for the single requestor. Data entries of the single logical queue are stored in an alternating pattern among the multiple linked lists. During the allocation of buffer entries, the tail pointers are selected in the same alternating manner, and during the deallocation of buffer entries, the multiple head pointers are selected in the same manner.

Type: Grant

Filed: February 24, 2017

Date of Patent: January 28, 2020

Assignee: ATI Technologies ULC

Inventors: Jimshed Mirza, Qian Ma
High-speed selective cache invalidates and write-backs on GPUS

Patent number: 10540280

Abstract: Techniques for performing cache invalidates and write-backs in an accelerated processing device (e.g., a graphics processing device that renders three-dimensional graphics) are disclosed. The techniques involve receiving requests from a “master” (e.g., the central processing unit). The techniques involve invalidating virtual-to-physical address translations in an address translation request. The techniques include splitting up the requests based on whether the requests target virtually or physically tagged caches. Addresses for the portions of a request that target physically tagged caches are translated using invalidated virtual-to-physical address translations for speed. The split up request is processed to generate micro-transactions for individual caches targeted by the request. Micro-transactions for physically and virtually tagged caches are processed in parallel. Once all micro-transactions for a request have been processed, the unit that made the request is notified.

Type: Grant

Filed: December 23, 2016

Date of Patent: January 21, 2020

Assignees: ADVANCED MICRO DEVICES, INC., ATI TECHNOLOGIES ULC

Inventors: Mark Fowler, Jimshed Mirza, Anthony Asaro
Method and apparatus for translation lookaside buffer with multiple compressed encodings

Patent number: 10540290

Abstract: Methods and apparatus obtain one or more system page table entries that represent virtual system (e.g., memory) page to physical system page translations. A number of the obtained system page table entries that can be encoded in each of a plurality of translation lookaside buffer (TLB) entry encoding formats are determined. The method and apparatus may select one of the TLB entry encoding formats that encode a number of the obtained system page table entries. The method and apparatus may encode a number of obtained system page table entries in the TLB entry encoding format selected into a compressed encoding format TLB entry. The method and apparatus may associate the compressed encoding format TLB entry with an encoding format indication of the encoding format selected. The method and apparatus may decode a compressed encoding format TLB entry based on a determined TLB entry encoding format.

Type: Grant

Filed: April 27, 2016

Date of Patent: January 21, 2020

Assignees: ATI Technologies ULC, Advanced Micro Devices, Inc.

Inventors: Gabriel H Loh, Jimshed Mirza
METHOD AND SYSTEM FOR PARTIAL WAVEFRONT MERGER

Publication number: 20200019530

Abstract: A method and system for partial wavefront merger is described. Vector processing machines employ the partial wavefront merger to merge partial wavefronts into one or more wavefronts. The system includes a partial wavefront manager and unified registers. The partial wavefront manager detects wavefronts in different single-instruction-multiple-data (“SIMD”) units which contain inactive work items and active work items (hereinafter referred to as “partial wavefronts”), moves the partial wavefronts into one or more SIMD unit(s) and merges the partial wavefronts into one or more wavefront(s). The unified register allows each active work item in the one or more merged wavefront(s) to access the previously allocated registers in the originating SIMD units. Consequently, the contents of the unified registers do not have to be copied to the SIMD unit(s) executing the one or merged wavefront(s).

Type: Application

Filed: July 23, 2018

Publication date: January 16, 2020

Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Yunpeng Zhu, Jimshed Mirza
Shader writes to compressed resources

Patent number: 10535178

Abstract: Systems, apparatuses, and methods for performing shader writes to compressed surfaces are disclosed. In one embodiment, a processor includes at least a memory and one or more shader units. In one embodiment, a shader unit of the processor is configured to receive a write request targeted to a compressed surface. The shader unit is configured to identify a first block of the compressed surface targeted by the write request. Responsive to determining the data of the write request targets less than the entirety of the first block, the first shader unit reads the first block from the cache and decompress the first block. Next, the first shader unit merges the data of the write request with the decompressed first block. Then, the shader unit compresses the merged data and writes the merged data to the cache.

Type: Grant

Filed: December 22, 2016

Date of Patent: January 14, 2020

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Jimshed Mirza, Christopher J. Brennan, Anthony Chan, Leon Lai
Register allocation modes in a GPU based on total, maximum concurrent, and minimum number of registers needed by complex shaders

Patent number: 10353859

Abstract: A method for allocating registers in a compute unit of a vector processor includes determining a maximum number of registers that are to be used concurrently by a plurality of threads of a kernel at the compute unit. The method further includes setting a mode of register allocation at the compute unit based on a comparison of the determined maximum number of registers and a total number of physical registers implemented at the compute unit.

Type: Grant

Filed: February 14, 2017

Date of Patent: July 16, 2019

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: YunPeng Zhu, Jimshed Mirza
Selecting a default page size in a variable page size TLB

Patent number: 10241925

Abstract: Systems, apparatuses, and methods for selecting default page sizes in a variable page size translation lookaside buffer (TLB) are disclosed. In one embodiment, a system includes at least one processor, a memory subsystem, and a first TLB. The first TLB is configured to allocate a first entry for a first request responsive to detecting a miss for the first request in the first TLB. Prior to determining a page size targeted by the first request, the first TLB specifies, in the first entry, that the first request targets a page of a first page size. Responsive to determining that the first request actually targets a second page size, the first TLB reissues the first request with an indication that the first request targets the second page size. On the reissue, the first TLB allocates a second entry and specifies the second page size for the first request.

Type: Grant

Filed: February 15, 2017

Date of Patent: March 26, 2019

Assignee: ATI Technologies ULC

Inventors: Jimshed Mirza, Anthony Chan, Edwin Chi Yeung Pang
Input/output memory map unit and northbridge

Patent number: 10223280

Abstract: A system including a gasket communicatively coupled between a unified northbridge (UNB) having a cache coherent interconnect (CCI) interface and a processor having an Advanced eXtensible Interface (AXI) coherency extension (ACE). The gasket is configured to translate requests from the processor that include ACE commands into equivalent CCI commands, wherein each request from the processor maps onto a specific CCI request type. The gasket is further configured to translate ACE tags into CCI tags. The gasket is further configured to translate CCI encoded probes from a system resource interface (SRI) into equivalent ACE snoop transactions. The gasket is further configured to translate the memory map to inter-operate with a UNB/coherent HyperTransport (cHT) environment. The gasket is further configured to receive a barrier transaction that is used to provide ordering for transactions.

Type: Grant

Filed: July 2, 2018

Date of Patent: March 5, 2019

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Vydhyanathan Kalyanasundharam, Yaniv Adiri, Philip Ng, Maggie Chan, Vincent Cueva, Anthony Asaro, Jimshed Mirza, Greggory D. Donley, Bryan Broussard, Benjamin Tsien
FLEXIBLE SHADER EXPORT DESIGN IN MULTIPLE COMPUTING CORES

Publication number: 20180314528

Abstract: Systems, apparatuses, and methods for generating flexibly addressed memory requests are disclosed. In one embodiment, a system includes a processor, control unit, and memory subsystem. The processor launches a plurality of threads on a plurality of compute units, wherein each thread generates memory requests without specifying target memory addresses. The threads executing on the plurality of compute units convey a plurality of memory requests to the control unit. The control unit generates target memory addresses for the plurality of received memory requests. In one embodiment, the memory requests are write requests, and the control unit interleaves write requests from the plurality of threads into a single output buffer stored in the memory subsystem. The control unit can be located in a cache, in a memory controller, or in another location within the system.

Type: Application

Filed: May 26, 2017

Publication date: November 1, 2018

Inventors: Yunpeng Zhu, Jimshed Mirza
INPUT/OUTPUT MEMORY MAP UNIT AND NORTHBRIDGE

Publication number: 20180307619

Abstract: A system including a gasket communicatively coupled between a unified northbridge (UNB) having a cache coherent interconnect (CCI) interface and a processor having an Advanced eXtensible Interface (AXI) coherency extension (ACE). The gasket is configured to translate requests from the processor that include ACE commands into equivalent CCI commands, wherein each request from the processor maps onto a specific CCI request type. The gasket is further configured to translate ACE tags into CCI tags. The gasket is further configured to translate CCI encoded probes from a system resource interface (SRI) into equivalent ACE snoop transactions. The gasket is further configured to translate the memory map to inter-operate with a UNB/coherent HyperTransport (cHT) environment. The gasket is further configured to receive a barrier transaction that is used to provide ordering for transactions.

Type: Application

Filed: July 2, 2018

Publication date: October 25, 2018

Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Vydhyanathan Kalyanasundharam, Philip Ng, Maggie Chan, Vincent Cueva, Anthony Asaro, Jimshed Mirza, Greggory D. Donley, Bryan Broussard, Benjamin Tsien, Yaniv Adiri
HARDWARE STRUCTURE TO TRACK PAGE REUSE

Publication number: 20180308216

Abstract: Systems, apparatuses, and methods for tracking page reuse and migrating pages are disclosed. In one embodiment, a system includes one or more processors, a memory access monitor, and multiple memory regions. The memory access monitor tracks accesses to memory pages in a system memory during a programmable interval. If the number of accesses to a given page is greater than a programmable threshold during the programmable interval, then the memory access monitor generates an interrupt for software to migrate the given page from the system memory to a local memory. If the number of accesses to the given page is less than or equal to the programmable threshold during the programmable interval, then the given page remains in the system memory. After the programmable interval, the memory access monitor starts tracking the number of accesses to a new page in a subsequent interval.

Type: Application

Filed: April 25, 2017

Publication date: October 25, 2018

Inventors: Jimshed Mirza, Al Hasanur Rahman, Sergey Korobkov, Houman Namiranian

1 2 next