Patents by Inventor Anjana Rajendran

Anjana Rajendran has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240095065
    Abstract: Techniques are disclosed relating to multi-stage thread scheduling. In some embodiments, processor circuitry includes multiple channel pipelines for multiple channels and multiple execution pipelines shared by the channel pipelines and configured to perform different types of operations provided by the channel pipelines. First scheduler circuitry may arbitrate among threads to assign threads to channels. Second scheduler circuitry may arbitrate among channels to assign an operation from a given channel to a given execution pipeline. The execution pipelines may provide backpressure information to the first scheduler circuitry based on execution status and the first scheduler circuitry may adjust priority of a thread for assignment to a channel based on the backpressure information. Disclosed techniques may reduce channel conflicts and starvation for execution resources.
    Type: Application
    Filed: November 10, 2022
    Publication date: March 21, 2024
    Inventors: Benjiman L. Goodman, Anjana Rajendran, Sheenam Jayaswal, Terence M. Potter, Yoong Chert Foo
  • Publication number: 20230385201
    Abstract: Techniques are disclosed relating to private memory management using a mapping thread, which may be persistent. In some embodiments, a graphics processor is configured to generate a pool of private memory pages for a set of graphics work that includes multiple threads. The processor may maintain a translation table configured to map private memory addresses to virtual addresses based on identifiers of the threads. The processor may execute a mapping thread to receive a request to allocate a private memory page for a requesting thread, select a private memory page from the pool in response to the request, and map the selected page in the translation table for the requesting. The processor may then execute one or more instructions of the requesting thread to access a private memory space, wherein the execution includes translation of a private memory address to a virtual address based on the mapped page in the translation table.
    Type: Application
    Filed: July 31, 2023
    Publication date: November 30, 2023
    Inventors: Benjiman L. Goodman, Terence M. Potter, Anjana Rajendran, Mark I. Luffel, William V. Miller
  • Patent number: 11714759
    Abstract: Techniques are disclosed relating to private memory management using a mapping thread, which may be persistent. In some embodiments, a graphics processor is configured to generate a pool of private memory pages for a set of graphics work that includes multiple threads. The processor may maintain a translation table configured to map private memory addresses to virtual addresses based on identifiers of the threads. The processor may execute a mapping thread to receive a request to allocate a private memory page for a requesting thread, select a private memory page from the pool in response to the request, and map the selected page in the translation table for the requesting. The processor may then execute one or more instructions of the requesting thread to access a private memory space, wherein the execution includes translation of a private memory address to a virtual address based on the mapped page in the translation table.
    Type: Grant
    Filed: August 17, 2020
    Date of Patent: August 1, 2023
    Assignee: Apple Inc.
    Inventors: Benjiman L. Goodman, Terence M. Potter, Anjana Rajendran, Mark I. Luffel, William V. Miller
  • Patent number: 11360780
    Abstract: Techniques are disclosed relating to context switching in a SIMD processor. In some embodiments, an apparatus includes pipeline circuitry configured to execute graphics instructions included in threads of a group of single-instruction multiple-data (SIMD) threads in a thread group. In some embodiments, context switch circuitry is configured to atomically: save, for the SIMD group, a program counter and information that indicates whether threads in the SIMD group are active using one or more context switch registers, set all threads to an active state for the SIMD group, and branch to handler code for the SIMD group. In some embodiments, the pipeline circuitry is configured to execute the handler code to save context information for the SIMD group and subsequently execute threads of another thread group. Disclosed techniques may allow instruction-level context switching even when some SIMD threads are non-active.
    Type: Grant
    Filed: January 22, 2020
    Date of Patent: June 14, 2022
    Assignee: Apple Inc.
    Inventors: Benjiman L. Goodman, Terence M. Potter, Anjana Rajendran, Jeffrey T. Brady, Brian K. Reynolds, Jeffrey A. Lohman
  • Publication number: 20220050790
    Abstract: Techniques are disclosed relating to private memory management using a mapping thread, which may be persistent. In some embodiments, a graphics processor is configured to generate a pool of private memory pages for a set of graphics work that includes multiple threads. The processor may maintain a translation table configured to map private memory addresses to virtual addresses based on identifiers of the threads. The processor may execute a mapping thread to receive a request to allocate a private memory page for a requesting thread, select a private memory page from the pool in response to the request, and map the selected page in the translation table for the requesting. The processor may then execute one or more instructions of the requesting thread to access a private memory space, wherein the execution includes translation of a private memory address to a virtual address based on the mapped page in the translation table.
    Type: Application
    Filed: August 17, 2020
    Publication date: February 17, 2022
    Inventors: Benjiman L. Goodman, Terence M. Potter, Anjana Rajendran, Mark I. Luffel, William V. Miller
  • Patent number: 11204774
    Abstract: Techniques are disclosed relating to a thread-group-scoped gate instruction. In some embodiments, graphics processor circuitry is configured to execute, for multiple SIMD groups of a thread group, a graphics program that includes a gate instruction. During execution of the gate instruction for a first SIMD group, the processor accesses state information to determine that a threshold number of other SIMD groups in the thread group have not yet executed the gate instruction. Based on the determination, the processor executes a particular set of instructions of the graphics program for the first SIMD group (that is not executed by one or more other SIMD groups that reach the gate instruction after the first SIMD group). For example, the particular set of instructions may be a utility program that performs one or more operations for the entire thread group but is only executed by a subset of the SIMD groups.
    Type: Grant
    Filed: August 31, 2020
    Date of Patent: December 21, 2021
    Assignee: Apple Inc.
    Inventors: Benjiman L. Goodman, Anjana Rajendran, Jeffrey A. Lohman, Terence M. Potter
  • Publication number: 20210224072
    Abstract: Techniques are disclosed relating to context switching in a SIMD processor. In some embodiments, an apparatus includes pipeline circuitry configured to execute graphics instructions included in threads of a group of single-instruction multiple-data (SIMD) threads in a thread group. In some embodiments, context switch circuitry is configured to atomically: save, for the SIMD group, a program counter and information that indicates whether threads in the SIMD group are active using one or more context switch registers, set all threads to an active state for the SIMD group, and branch to handler code for the SIMD group. In some embodiments, the pipeline circuitry is configured to execute the handler code to save context information for the SIMD group and subsequently execute threads of another thread group. Disclosed techniques may allow instruction-level context switching even when some SIMD threads are non-active.
    Type: Application
    Filed: January 22, 2020
    Publication date: July 22, 2021
    Inventors: Benjiman L. Goodman, Terence M. Potter, Anjana Rajendran, Jeffrey T. Brady, Brian K. Reynolds, Jeffrey A. Lohman
  • Patent number: 10152329
    Abstract: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced.
    Type: Grant
    Filed: February 9, 2012
    Date of Patent: December 11, 2018
    Assignee: NVIDIA CORPORATION
    Inventors: Michael Fetterman, Stewart Glenn Carlton, Jack Hilaire Choquette, Shirish Gadre, Olivier Giroux, Douglas J. Hahn, Steven James Heinrich, Eric Lyell Hill, Charles McCarver, Omkar Paranjape, Anjana Rajendran, Rajeshwaran Selvanesan
  • Patent number: 10095548
    Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.
    Type: Grant
    Filed: May 21, 2012
    Date of Patent: October 9, 2018
    Assignee: NVIDIA CORPORATION
    Inventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
  • Patent number: 9952977
    Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method including receiving an instruction that includes a cache operations modifier that identifies a level of the parallel cache hierarchy in which to cache data associated with the instruction; and implementing a cache replacement policy based on the cache operations modifier.
    Type: Grant
    Filed: September 24, 2010
    Date of Patent: April 24, 2018
    Assignee: NVIDIA CORPORATION
    Inventors: Steven James Heinrich, Alexander L. Minkin, Brett W. Coon, Rajeshwaran Selvanesan, Robert Steven Glanville, Charles McCarver, Anjana Rajendran, Stewart Glenn Carlton, John R. Nickolls, Brian Fahs
  • Patent number: 9836325
    Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.
    Type: Grant
    Filed: May 21, 2012
    Date of Patent: December 5, 2017
    Assignee: NVIDIA Corporation
    Inventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
  • Patent number: 9817668
    Abstract: One embodiment of the present invention sets forth an approach for executing replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back.
    Type: Grant
    Filed: December 16, 2011
    Date of Patent: November 14, 2017
    Assignee: NVIDIA Corporation
    Inventors: Michael Fetterman, Jack Hilaire Choquette, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Stewart Glenn Carlton, Rajeshwaran Selvanesan, Douglas J. Hahn, Steven James Heinrich
  • Patent number: 9755994
    Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.
    Type: Grant
    Filed: May 21, 2012
    Date of Patent: September 5, 2017
    Assignee: NVIDIA Corporation
    Inventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
  • Patent number: 9626191
    Abstract: One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers.
    Type: Grant
    Filed: December 22, 2011
    Date of Patent: April 18, 2017
    Assignee: NVIDIA Corporation
    Inventors: Jack Hilaire Choquette, Michael Fetterman, Shirish Gadre, Xiaogang Qiu, Omkar Paranjape, Anjana Rajendran, Stewart Glenn Carlton, Eric Lyell Hill, Rajeshwaran Selvanesan, Douglas J. Hahn
  • Patent number: 8997103
    Abstract: One embodiment sets forth a technique for N-way memory barrier operation coalescing. When a first memory barrier is received for a first thread group execution of subsequent memory operations for the first thread group are suspended until the first memory barrier is executed. Subsequent memory barriers for different thread groups may be coalesced with the first memory barrier to produce a coalesced memory barrier that represents memory barrier operations for multiple thread groups. When the coalesced memory barrier is being processed, execution of subsequent memory operations for the different thread groups is also suspended. However, memory operations for other thread groups that are not affected by the coalesced memory barrier may be executed.
    Type: Grant
    Filed: April 6, 2012
    Date of Patent: March 31, 2015
    Assignee: NVIDIA Corporation
    Inventors: Shirish Gadre, Charles McCarver, Anjana Rajendran, Omkar Paranjape, Steven James Heinrich
  • Patent number: 8595425
    Abstract: One embodiment of the present invention sets forth a technique for providing a L1 cache that is a central storage resource. The L1 cache services multiple clients with diverse latency and bandwidth requirements. The L1 cache may be reconfigured to create multiple storage spaces enabling the L1 cache may replace dedicated buffers, caches, and FIFOs in previous architectures. A “direct mapped” storage region that is configured within the L1 cache may replace dedicated buffers, FIFOs, and interface paths, allowing clients of the L1 cache to exchange attribute and primitive data. The direct mapped storage region may used as a global register file. A “local and global cache” storage region configured within the L1 cache may be used to support load/store memory requests to multiple spaces. These spaces include global, local, and call-return stack (CRS) memory.
    Type: Grant
    Filed: September 25, 2009
    Date of Patent: November 26, 2013
    Assignee: NVIDIA Corporation
    Inventors: Alexander L. Minkin, Steven James Heinrich, RaJeshwaran Selvanesan, Brett W. Coon, Charles McCarver, Anjana Rajendran, Stewart G. Carlton
  • Publication number: 20130311996
    Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.
    Type: Application
    Filed: May 21, 2012
    Publication date: November 21, 2013
    Inventors: Michael FETTERMAN, Shirish GADRE, John H. EDMONDSON, Omkar PARANJAPE, Anjana RAJENDRAN, Eric Lyell HILL, Rajeshwaran SELVANESAN, Charles McCARVER, Kevin MITCHELL, Steven James HEINRICH
  • Publication number: 20130311686
    Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.
    Type: Application
    Filed: May 21, 2012
    Publication date: November 21, 2013
    Inventors: Michael FETTERMAN, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
  • Publication number: 20130311999
    Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.
    Type: Application
    Filed: May 21, 2012
    Publication date: November 21, 2013
    Inventors: Michael FETTERMAN, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
  • Publication number: 20130212364
    Abstract: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced.
    Type: Application
    Filed: February 9, 2012
    Publication date: August 15, 2013
    Inventors: Michael FETTERMAN, Stewart Glenn Carlton, Jack Hilaire Choquette, Shirish Gadre, Olivier Giroux, Douglas J. Hahn, Steven James Heinrich, Eric Lyell Hill, Charles McCarver, Omkar Paranjape, Anjana Rajendran, Rajeshwaran Selvanesan