Patents by Inventor Karim M. Abdalla

Karim M. Abdalla has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9921873
    Abstract: A technique for controlling the distribution of compute task processing in a multi-threaded system encodes each processing task as task metadata (TMD) stored in memory. The TMD includes work distribution parameters specifying how the processing task should be distributed for processing. Scheduling circuitry selects a task for execution when entries of a work queue for the task have been written. The work distribution parameters may define a number of work queue entries needed before a cooperative thread array” (“CTA”) may be launched to process the work queue entries according to the compute task. The work distribution parameters may define a number of CTAs that are launched to process the same work queue entries. Finally, the work distribution parameters may define a step size that is used to update pointers to the work queue entries.
    Type: Grant
    Filed: January 31, 2012
    Date of Patent: March 20, 2018
    Assignee: NVIDIA Corporation
    Inventors: Lacky V. Shah, Karim M. Abdalla, Sean J. Treichler, Abraham B. de Waal
  • Patent number: 9792122
    Abstract: One embodiment of the present invention includes a technique for processing graphics primitives in a tile-based architecture. The technique includes storing, in a buffer, a first plurality of graphics primitives and a first plurality of state bundles received from the world-space pipeline. The technique further includes determining, based on a first condition, that the first plurality of graphics primitives should be replayed from the buffer, and, in response, replaying the first plurality of graphics primitives against a first tile included in a first plurality of tiles. Replaying the first plurality of graphics primitives includes comparing each graphics primitive against the first tile to determine whether the graphics primitive intersects the first tile, determining that one or more graphics primitives intersects the first tile, and transmitting the one or more graphics primitives and one or more associated state bundles to a screen-space pipeline for processing.
    Type: Grant
    Filed: October 4, 2013
    Date of Patent: October 17, 2017
    Assignee: NVIDIA CORPORATION
    Inventors: Ziyad S. Hakura, Walter R. Steiner, Cynthia Ann Edgeworth Allison, Rouslan Dimitrov, Karim M. Abdalla, Dale L. Kirkland, Emmett M. Kilgariff
  • Patent number: 9715413
    Abstract: One embodiment of the present invention sets forth a technique for selecting a first processor included in a plurality of processors to receive work related to a compute task. The technique involves analyzing state data of each processor in the plurality of processors to identify one or more processors that have already been assigned one compute task and are eligible to receive work related to the one compute task, receiving, from each of the one or more processors identified as eligible, an availability value that indicates the capacity of the processor to receive new work, selecting a first processor to receive work related to the one compute task based on the availability values received from the one or more processors, and issuing, to the first processor via a cooperative thread array (CTA), the work related to the one compute task.
    Type: Grant
    Filed: January 18, 2012
    Date of Patent: July 25, 2017
    Assignee: NVIDIA Corporation
    Inventors: Karim M. Abdalla, Lacky V. Shah, Jerome F. Duluk, Jr., Timothy John Purcell, Tanmoy Mandal, Gentaro Hirota
  • Patent number: 9710306
    Abstract: Systems and methods for auto-throttling encapsulated compute tasks. A device driver may configure a parallel processor to execute compute tasks in a number of discrete throttled modes. The device driver may also allocate memory to a plurality of different processing units in a non-throttled mode. The device driver may also allocate memory to a subset of the plurality of processing units in each of the throttling modes. Data structures defined for each task include a flag that instructs the processing unit whether the task may be executed in the non-throttled mode or in the throttled mode. A work distribution unit monitors each of the tasks scheduled to run on the plurality of processing units and determines whether the processor should be configured to run in the throttled mode or in the non-throttled mode.
    Type: Grant
    Filed: April 9, 2012
    Date of Patent: July 18, 2017
    Assignee: NVIDIA Corporation
    Inventors: Jerome F. Duluk, Jr., Jesse David Hall, Philip Alexander Cuadra, Karim M. Abdalla
  • Patent number: 9639366
    Abstract: One embodiment of the present invention sets forth a technique for managing buffer table entries in a tile-based architecture. The technique includes binding a plurality of shader registers to a buffer table entry. The technique further includes processing at least one tile by reading a buffer table index stored in the shader register to access the buffer table entry, reading a buffer address stored in the buffer table entry, accessing data associated with the buffer address, and unbinding the shader register from the buffer table entry. The technique further includes determining that none of the shader registers is still bound to the buffer table entry and, in response, causing a release packet to be inserted into an instruction stream. The technique further includes determining that a last tile has been processed and, in response, transmitting the release packet to cause the buffer table entry to be released.
    Type: Grant
    Filed: October 3, 2013
    Date of Patent: May 2, 2017
    Assignee: NVIDIA CORPORATION
    Inventors: Karim M. Abdalla, Ziyad S. Hakura, Cynthia Ann Edgeworth Allison, Dale L. Kirkland
  • Patent number: 9594599
    Abstract: A work distribution unit distributes work batches to general processing clusters (GPCs) based on the number of streaming multiprocessors included in each GPC. Advantageously, each GPC receives an amount of work that is proportional to the amount of processing power afforded by the GPC. Embodiments include a method for distributing batches of processing tasks to two or more general processing clusters (GPCs), including the steps of updating a counter value for each of the two or more GPCs based on the number of enabled parallel processing units within each of the two or more GPCs, and distributing a batch of processing tasks to a first GPC of the two or more GPCs based on a counter value associated with the first GPC and based on a load signal received from the first GPC.
    Type: Grant
    Filed: October 14, 2009
    Date of Patent: March 14, 2017
    Assignee: NVIDIA Corporation
    Inventors: Philip Browning Johnson, Dale L. Kirkland, Karim M. Abdalla
  • Patent number: 9507638
    Abstract: One embodiment of the present invention sets forth a technique for managing the allocation and release of resources during multi-threaded program execution. Programmable reference counters are initialized to values that limit the amount of resources for allocation to tasks that share the same reference counter. Resource parameters are specified for each task to define the amount of resources allocated for consumption by each array of execution threads that is launched to execute the task. The resource parameters also specify the behavior of the array for acquiring and releasing resources. Finally, during execution of each thread in the array, an exit instruction may be configured to override the release of the resources that were allocated to the array. The resources may then be retained for use by a child task that is generated during execution of a thread.
    Type: Grant
    Filed: November 8, 2011
    Date of Patent: November 29, 2016
    Assignee: NVIDIA Corporation
    Inventors: Philip Alexander Cuadra, Karim M. Abdalla, Jerome F. Duluk, Jr., Luke Durant, Gerald F. Luiz, Timothy John Purcell, Lacky V. Shah
  • Patent number: 9436969
    Abstract: One embodiment of the present invention sets forth a technique for redistributing geometric primitives generated by tessellation and geometry shaders for processing by multiple graphics pipelines. Geometric primitives that are generated in a first processing cycle are collected and redistributed more evenly and in smaller tasks to the multiple graphics pipelines for vertex processing in a second processing cycle. The smaller tasks do not exceed the resource limits of a graphics pipeline and the per-vertex processing workloads of the graphics pipelines in the second cycle are balanced and make full use of resources. Therefore, the performance of the tessellation and geometry shaders is improved.
    Type: Grant
    Filed: August 11, 2011
    Date of Patent: September 6, 2016
    Assignee: NVIDIA Corporation
    Inventors: Ziyad S. Hakura, Emmett M. Kilgariff, Dale L. Kirkland, Johnny S. Rhoades, Cynthia Ann Edgeworth Allison, Karim M. Abdalla
  • Patent number: 9069609
    Abstract: One embodiment of the present invention sets forth a technique for assigning a compute task to a first processor included in a plurality of processors. The technique involves analyzing each compute task in a plurality of compute tasks to identify one or more compute tasks that are eligible for assignment to the first processor, where each compute task is listed in a first table and is associated with a priority value and an allocation order that indicates relative time at which the compute task was added to the first table. The technique further involves selecting a first task compute from the identified one or more compute tasks based on at least one of the priority value and the allocation order, and assigning the first compute task to the first processor for execution.
    Type: Grant
    Filed: January 18, 2012
    Date of Patent: June 30, 2015
    Assignee: NVIDIA CORPORATION
    Inventors: Karim M. Abdalla, Lacky V. Shah, Jerome F. Duluk, Jr., Timothy John Purcell, Tanmoy Mandal, Gentaro Hirota
  • Patent number: 8984183
    Abstract: One embodiment of the present invention sets forth a technique for enabling the insertion of generated tasks into a scheduling pipeline of a multiple processor system allows a compute task that is being executed to dynamically generate a dynamic task and notify a scheduling unit of the multiple processor system without intervention by a CPU. A reflected notification signal is generated in response to a write request when data for the dynamic task is written to a queue. Additional reflected notification signals are generated for other events that occur during execution of a compute task, e.g., to invalidate cache entries storing data for the compute task and to enable scheduling of another compute task.
    Type: Grant
    Filed: December 16, 2011
    Date of Patent: March 17, 2015
    Assignee: Nvidia Corporation
    Inventors: Timothy John Purcell, Lacky V. Shah, Jerome F. Duluk, Jr., Sean J. Treichler, Karim M. Abdalla, Philip Alexander Cuadra, Brian Pharris
  • Patent number: 8917271
    Abstract: One embodiment of the present invention sets forth a technique for redistributing geometric primitives generated by tessellation and geometry shaders for per-vertex by multiple graphics pipelines. Geometric primitives that are generated in a first processing stage are collected and redistributed more evenly and in smaller batches to the multiple graphics pipelines for vertex processing in a second processing stage. The smaller batches do not exceed the resource limits of a graphics pipeline and the per-vertex processing workloads of the graphics pipelines in the second stage are balanced. Therefore, the performance of the tessellation and geometry shaders is improved.
    Type: Grant
    Filed: October 4, 2010
    Date of Patent: December 23, 2014
    Assignee: NVIDIA Corporation
    Inventors: Johnny S. Rhoades, Ziyad S. Hakura, Emmett M. Kilgariff, Dale L. Kirkland, Cynthia Ann Edgeworth Allison, Karl M. Wurstner, Karim M. Abdalla
  • Publication number: 20140118376
    Abstract: One embodiment of the present invention includes a technique for processing graphics primitives in a tile-based architecture. The technique includes storing, in a buffer, a first plurality of graphics primitives and a first plurality of state bundles received from the world-space pipeline. The technique further includes determining, based on a first condition, that the first plurality of graphics primitives should be replayed from the buffer, and, in response, replaying the first plurality of graphics primitives against a first tile included in a first plurality of tiles. Replaying the first plurality of graphics primitives includes comparing each graphics primitive against the first tile to determine whether the graphics primitive intersects the first tile, determining that one or more graphics primitives intersects the first tile, and transmitting the one or more graphics primitives and one or more associated state bundles to a screen-space pipeline for processing.
    Type: Application
    Filed: October 4, 2013
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Ziyad S. HAKURA, Walter R. STEINER, Cynthia Ann Edgeworth ALLISON, Rouslan DIMITROV, Karim M. ABDALLA, Dale L. KIRKLAND, Emmett M. KILGARIFF
  • Publication number: 20140118375
    Abstract: One embodiment of the present invention sets forth a technique for managing buffer table entries in a tile-based architecture. The technique includes binding a plurality of shader registers to a buffer table entry. The technique further includes processing at least one tile by reading a buffer table index stored in the shader register to access the buffer table entry, reading a buffer address stored in the buffer table entry, accessing data associated with the buffer address, and unbinding the shader register from the buffer table entry. The technique further includes determining that none of the shader registers is still bound to the buffer table entry and, in response, causing a release packet to be inserted into an instruction stream. The technique further includes determining that a last tile has been processed and, in response, transmitting the release packet to cause the buffer table entry to be released.
    Type: Application
    Filed: October 3, 2013
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Karim M. ABDALLA, Ziyad S. HAKURA, Cynthia Ann Edgeworth ALLISON, Dale L. KIRKLAND
  • Publication number: 20130268942
    Abstract: Systems and methods for auto-throttling encapsulated compute tasks. A device driver may configure a parallel processor to execute compute tasks in a number of discrete throttled modes. The device driver may also allocate memory to a plurality of different processing units in a non-throttled mode. The device driver may also allocate memory to a subset of the plurality of processing units in each of the throttling modes. Data structures defined for each task include a flag that instructs the processing unit whether the task may be executed in the non-throttled mode or in the throttled mode. A work distribution unit monitors each of the tasks scheduled to run on the plurality of processing units and determines whether the processor should be configured to run in the throttled mode or in the non-throttled mode.
    Type: Application
    Filed: April 9, 2012
    Publication date: October 10, 2013
    Inventors: Jerome F. DULUK, JR., Jesse David Hall, Philip Alexander Cuadra, Karim M. Abdalla
  • Publication number: 20130198759
    Abstract: A technique for controlling the distribution of compute task processing in a multi-threaded system encodes each processing task as task metadata (TMD) stored in memory. The TMD includes work distribution parameters specifying how the processing task should be distributed for processing. Scheduling circuitry selects a task for execution when entries of a work queue for the task have been written. The work distribution parameters may define a number of work queue entries needed before a cooperative thread array” (“CTA”) may be launched to process the work queue entries according to the compute task. The work distribution parameters may define a number of CTAS that are launched to process the same work queue entries. Finally, the work distribution parameters may define a step size that is used to update pointers to the work queue entries.
    Type: Application
    Filed: January 31, 2012
    Publication date: August 1, 2013
    Inventors: Lacky V. SHAH, Karim M. ABDALLA, Sean J. TREICHLER, Abraham B. DE WAAL
  • Publication number: 20130185728
    Abstract: One embodiment of the present invention sets forth a technique for assigning a compute task to a first processor included in a plurality of processors. The technique involves analyzing each compute task in a plurality of compute tasks to identify one or more compute tasks that are eligible for assignment to the first processor, where each compute task is listed in a first table and is associated with a priority value and an allocation order that indicates relative time at which the compute task was added to the first table. The technique further involves selecting a first task compute from the identified one or more compute tasks based on at least one of the priority value and the allocation order, and assigning the first compute task to the first processor for execution.
    Type: Application
    Filed: January 18, 2012
    Publication date: July 18, 2013
    Inventors: Karim M. Abdalla, Lacky V. Shah, Jerome F. Duluk, JR., Timothy John Purcell, Tanmoy Mandal, Gentaro Hirota
  • Publication number: 20130185725
    Abstract: One embodiment of the present invention sets forth a technique for selecting a first processor included in a plurality of processors to receive work related to a compute task. The technique involves analyzing state data of each processor in the plurality of processors to identify one or more processors that have already been assigned one compute task and are eligible to receive work related to the one compute task, receiving, from each of the one or more processors identified as eligible, an availability value that indicates the capacity of the processor to receive new work, selecting a first processor to receive work related to the one compute task based on the availability values received from the one or more processors, and issuing, to the first processor via a cooperative thread array (CTA), the work related to the one compute task.
    Type: Application
    Filed: January 18, 2012
    Publication date: July 18, 2013
    Inventors: Karim M. ABDALLA, Lacky V. Shah, Jerome F. Duluk, JR., Timothy John Purcell, Tanmoy Mandal, Gentaro Hirota
  • Publication number: 20130160021
    Abstract: One embodiment of the present invention sets forth a technique for enabling the insertion of generated tasks into a scheduling pipeline of a multiple processor system allows a compute task that is being executed to dynamically generate a dynamic task and notify a scheduling unit of the multiple processor system without intervention by a CPU. A reflected notification signal is generated in response to a write request when data for the dynamic task is written to a queue. Additional reflected notification signals are generated for other events that occur during execution of a compute task, e.g., to invalidate cache entries storing data for the compute task and to enable scheduling of another compute task.
    Type: Application
    Filed: December 16, 2011
    Publication date: June 20, 2013
    Inventors: Timothy John PURCELL, Lacky V. Shah, Jerome F. Duluk, JR., Sean J. Treichler, Karim M. Abdalla, Philip Alexander Cuadra, Brian Pharris
  • Patent number: 8456481
    Abstract: A method of organizing memory for storage of texture data, in accordance with one embodiment of the invention, includes accessing a size of a mipmap level of a texture map. A block dimension may be determined based on the size of the mipmap level. A memory space (e.g., computer-readable medium) may be logically divided into a plurality of whole number of blocks of variable dimension. The dimension of the blocks is measured in units of gobs and each gob is of a fixed dimension of bytes. A mipmap level of a texture map may be stored in the memory space. A texel coordinate of said mipmap level may be converted into a byte address of the memory space by determining a gob address of a gob in which the texel coordinate resides and determining a byte address within the particular gob.
    Type: Grant
    Filed: March 16, 2012
    Date of Patent: June 4, 2013
    Assignee: Nvidia Corporation
    Inventors: Walter E. Donovan, Emmett M. Kilgariff, Karim M. Abdalla, Joel J. McCormack
  • Publication number: 20130117758
    Abstract: One embodiment of the present invention sets forth a technique for managing the allocation and release of resources during multi-threaded program execution. Programmable reference counters are initialized to values that limit the amount of resources for allocation to tasks that share the same reference counter. Resource parameters are specified for each task to define the amount of resources allocated for consumption by each array of execution threads that is launched to execute the task. The resource parameters also specify the behavior of the array for acquiring and releasing resources. Finally, during execution of each thread in the array, an exit instruction may be configured to override the release of the resources that were allocated to the array. The resources may then be retained for use by a child task that is generated during execution of a thread.
    Type: Application
    Filed: November 8, 2011
    Publication date: May 9, 2013
    Inventors: Philip Alexander Cuadra, Karim M. Abdalla, Jerome F. Duluk, JR., Luke Durant, Gerald F. Luiz, Timothy John Purcell, Lacky V. Shah