Patents by Inventor Andrei Khodakovsky
Andrei Khodakovsky has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10877757Abstract: A just-in-time (JIT) compiler binds constants to specific memory locations at runtime. The JIT compiler parses program code derived from a multithreaded application and identifies an instruction that references a uniform constant. The JIT compiler then determines a chain of pointers that originates within a root table specified in the multithreaded application and terminates at the uniform constant. The JIT compiler generates additional instructions for traversing the chain of pointers and inserts these instructions into the program code. A parallel processor executes this compiled code and, in doing so, causes a thread to traverse the chain of pointers and bind the uniform constant to a uniform register at runtime. Each thread in a group of threads executing on the parallel processor may then access the uniform constant.Type: GrantFiled: February 14, 2018Date of Patent: December 29, 2020Assignee: NVIDIA CorporationInventors: Ajay Tirumala, Jack Choquette, Manan Patel, Shirish Gadre, Praveen Kaushik, Amanpreet Grewal, Shekhar Divekar, Andrei Khodakovsky
-
Patent number: 10430915Abstract: One or more copy commands are scheduled for locating one or more pages of data in a local memory of a graphics processing unit (GPU) for more efficient access to the pages of data during rendering. A first processing unit that is coupled to a first GPU receives a notification that an access request count has reached a specified threshold. The first processing unit schedules a copy command to copy the first page of data to a first memory circuit of the first GPU from a second memory circuit of the second GPU. The copy command is included within a GPU command stream.Type: GrantFiled: January 24, 2018Date of Patent: October 1, 2019Assignee: NVIDIA CorporationInventors: Andrei Khodakovsky, Kirill A. Dmitriev, Rouslan L. Dimitrov, Tzyywei Hwang, Wishwesh Anil Gandhi, Lacky Vasant Shah
-
Patent number: 10402937Abstract: A method for rendering graphics frames allocates rendering work to multiple graphics processing units (GPUs) that are configured to allow access to pages of data stored in locally attached memory of a peer GPU. The method includes the steps of generating, by a first GPU coupled to a first memory circuit, one or more first memory access requests to render a first primitive for a first frame, where at least one of the first memory access requests targets a first page of data that physically resides within a second memory circuit coupled to a second GPU. The first GPU requests the first page of data through a first data link coupling the first GPU to the second GPU and a register circuit within the first GPU accumulates an access request count for the first page of data. The first GPU notifies a driver that the access request count has reached a specified threshold.Type: GrantFiled: December 28, 2017Date of Patent: September 3, 2019Assignee: NVIDIA CorporationInventors: Rouslan L. Dimitrov, Kirill A. Dmitriev, Andrei Khodakovsky, Tzyywei Hwang, Wishwesh Anil Gandhi, Lacky Vasant Shah
-
Publication number: 20190206018Abstract: One or more copy commands are scheduled for locating one or more pages of data in a local memory of a graphics processing unit (GPU) for more efficient access to the pages of data during rendering. A first processing unit that is coupled to a first GPU receives a notification that an access request count has reached a specified threshold. The first processing unit schedules a copy command to copy the first page of data to a first memory circuit of the first GPU from a second memory circuit of the second GPU. The copy command is included within a GPU command stream.Type: ApplicationFiled: January 24, 2018Publication date: July 4, 2019Inventors: Andrei Khodakovsky, Kirill A. Dmitriev, Rouslan L. Dimitrov, Tzyywei Hwang, Wishwesh Anil Gandhi, Lacky Vasant Shah
-
Publication number: 20190206023Abstract: A method for rendering graphics frames allocates rendering work to multiple graphics processing units (GPUs) that are configured to allow access to pages of data stored in locally attached memory of a peer GPU. The method includes the steps of generating, by a first GPU coupled to a first memory circuit, one or more first memory access requests to render a first primitive for a first frame, where at least one of the first memory access requests targets a first page of data that physically resides within a second memory circuit coupled to a second GPU. The first GPU requests the first page of data through a first data link coupling the first GPU to the second GPU and a register circuit within the first GPU accumulates an access request count for the first page of data. The first GPU notifies a driver that the access request count has reached a specified threshold.Type: ApplicationFiled: December 28, 2017Publication date: July 4, 2019Inventors: Rouslan L. Dimitrov, Kirill A. Dmitriev, Andrei Khodakovsky, Tzyywei Hwang, Wishwesh Anil Gandhi, Lacky Vasant Shah
-
Publication number: 20190146817Abstract: A just-in-time (JIT) compiler binds constants to specific memory locations at runtime. The JIT compiler parses program code derived from a multithreaded application and identifies an instruction that references a uniform constant. The JIT compiler then determines a chain of pointers that originates within a root table specified in the multithreaded application and terminates at the uniform constant. The JIT compiler generates additional instructions for traversing the chain of pointers and inserts these instructions into the program code. A parallel processor executes this compiled code and, in doing so, causes a thread to traverse the chain of pointers and bind the uniform constant to a uniform register at runtime. Each thread in a group of threads executing on the parallel processor may then access the uniform constant.Type: ApplicationFiled: February 14, 2018Publication date: May 16, 2019Inventors: Ajay TIRUMALA, Jack CHOQUETTE, Manan PATEL, Shirish GADRE, Praveen KAUSHIK, Amanpreet GREWAL, Shekhar DIVEKAR, Andrei KHODAKOVSKY
-
Patent number: 10083036Abstract: One embodiment of the present invention sets forth a technique for managing graphics processing resources in a tile-based architecture. The technique includes storing a release packet associated with a graphics processing resource in a buffer and initiating a replay of graphics primitives stored in the buffer and associated with the graphics processing resource. The technique further includes, for each tile included in a plurality of tiles and processed during the replay, reading the release packet and determining whether the tile is a last tile processed during the replay. The technique further includes determining not to transmit the release packet to a screen-space pipeline and continuing to read graphics data stored in the buffer if the tile is not the last tile to be processed during the replay, or transmitting the release packet to the screen-space pipeline if the tile is the last tile to be processed during the replay.Type: GrantFiled: October 3, 2013Date of Patent: September 25, 2018Assignee: NVIDIA CORPORATIONInventors: Ziyad S. Hakura, Cynthia Ann Edgeworth Allison, Dale L. Kirkland, Andrei Khodakovsky, Jeffrey A. Bolz
-
Patent number: 10032242Abstract: A method for managing bind-render-target commands in a tile-based architecture. The method includes receiving a requested set of bound render targets and a draw command. The method also includes, upon receiving the draw command, determining whether a current set of bound render targets includes each of the render targets identified in the requested set. The method further includes, if the current set does not include each render target identified in the requested set, then issuing a flush-tiling-unit-command to a parallel processing subsystem, modifying the current set to include each render target identified in the requested set, and issuing bind-render-target commands identifying the requested set to the tile-based architecture for processing. The method further includes, if the current set of render targets includes each render target identified in the requested set, then not issuing the flush-tiling-unit-command.Type: GrantFiled: October 1, 2013Date of Patent: July 24, 2018Assignee: NVIDIA CORPORATIONInventors: Ziyad S. Hakura, Jeffrey A. Bolz, Amanpreet Grewal, Matthew Johnson, Andrei Khodakovsky
-
Patent number: 9734548Abstract: One embodiment of the present invention includes techniques for adaptively sizing cache tiles in a graphics system. A device driver associated with a graphics system sets a cache tile size associated with a cache tile to a first size. The detects a change from a first render target configuration that includes a first set of render targets to a second render target configuration that includes a second set of render targets. The device driver sets the cache tile size to a second size based on the second render target configuration. One advantage of the disclosed approach is that the cache tile size is adaptively sized, resulting in fewer cache tiles for less complex render target configurations. Adaptively sizing cache tiles leads to more efficient processor utilization and reduced power requirements. In addition, a unified L2 cache tile allows dynamic partitioning of cache memory between cache tile data and other data.Type: GrantFiled: August 28, 2013Date of Patent: August 15, 2017Assignee: NVIDIA CorporationInventors: Ziyad S. Hakura, Rouslan Dimitrov, Emmett M. Kilgariff, Andrei Khodakovsky
-
Publication number: 20170206623Abstract: A method for managing bind-render-target commands in a tile-based architecture. The method includes receiving a requested set of bound render targets and a draw command. The method also includes, upon receiving the draw command, determining whether a current set of bound render targets includes each of the render targets identified in the requested set. The method further includes, if the current set does not include each render target identified in the requested set, then issuing a flush-tiling-unit-command to a parallel processing subsystem, modifying the current set to include each render target identified in the requested set, and issuing bind-render-target commands identifying the requested set to the tile-based architecture for processing. The method further includes, if the current set of render targets includes each render target identified in the requested set, then not issuing the flush-tiling-unit-command.Type: ApplicationFiled: October 1, 2013Publication date: July 20, 2017Applicant: NVIDIA CORPORATIONInventors: Ziyad S. HAKURA, Jeffrey A. BOLZ, Amanpreet GREWAL, Matthew JOHNSON, Andrei KHODAKOVSKY
-
Patent number: 9230362Abstract: A system, method, and computer program product enable compression with programmable sample locations, where the compression is a function of the programmable sample locations. The method includes the steps of storing a first value specifying a programmed sample location within a pixel in a sample pattern table and storing, in a memory, geometric surface parameters corresponding to a first attribute at the programmed sample location within a first pixel of a display surface. An instruction to store a second value specifying the programmed sample location within the pixel in the sample pattern table is received. The attribute is reconstructed based on the geometric surface parameters and the first value.Type: GrantFiled: September 11, 2013Date of Patent: January 5, 2016Assignee: NVIDIA CorporationInventors: Eric B. Lum, Jeffrey Alan Bolz, Rui Manuel Bastos, Andrei Khodakovsky, Christian Johannes Amsinck, Bengt-Olaf Schneider
-
Patent number: 9230363Abstract: A system, method, and computer program product enable compression with programmable sample locations, where the compression is a function of the programmable sample locations. The method includes the steps of storing a first value specifying a programmed sample location within a pixel in a first sample pattern table that is associated with a first display surface and storing, in a memory, geometric surface parameters corresponding to a first attribute at the programmed sample location within a first pixel of the first display surface. A second value specifying the programmed sample location within the pixel in a second sample pattern table that is associated with a second display surface is also stored and the first attribute is reconstructed based on the geometric surface parameters and the first value.Type: GrantFiled: September 11, 2013Date of Patent: January 5, 2016Assignee: NVIDIA CorporationInventors: Eric B. Lum, Jeffrey Alan Bolz, Rui Manuel Bastos, Andrei Khodakovsky, Christian Johannes Amsinck, Bengt-Olaf Schneider
-
Publication number: 20150109315Abstract: A system, method, and computer program product are provided for mapping tiles to physical memory locations. In use, a plurality of virtual tiles associated with a texture is identified. Additionally, a request to perform a mapping of the plurality of virtual tiles to one or more physical memory locations is received. Further, the plurality of virtual tiles is mapped to the one or more physical memory locations, utilizing a page table.Type: ApplicationFiled: October 23, 2013Publication date: April 23, 2015Applicant: NVIDIA CorporationInventors: Amanpreet Grewal, Andrei Khodakovsky, Yu Denny Dong, Henry Packard Moreton, Naveen Leekha
-
Publication number: 20150070380Abstract: A system, method, and computer program product are provided for using compression with programmable sample locations, where the compression is a function of the programmable sample locations. The method includes the steps of storing a first value specifying a programmed sample location within a pixel in a sample pattern table and storing, in a memory, geometric surface parameters corresponding to a first attribute at the programmed sample location within a first pixel of a display surface. An instruction to store a second value specifying the programmed sample location within the pixel in the sample pattern table is received. The attribute is reconstructed based on the geometric surface parameters and the first value.Type: ApplicationFiled: September 11, 2013Publication date: March 12, 2015Applicant: NVIDIA CorporationInventors: Eric B. Lum, Jeffrey Alan Bolz, Rui Manuel Bastos, Andrei Khodakovsky, Christian Johannes Amsinck, Bengt-Olaf Schneider
-
Publication number: 20150070381Abstract: A system, method, and computer program product are provided for using compression with programmable sample locations, where the compression is a function of the programmable sample locations. The method includes the steps of storing a first value specifying a programmed sample location within a pixel in a first sample pattern table that is associated with a first display surface and storing, in a memory, geometric surface parameters corresponding to a first attribute at the programmed sample location within a first pixel of the first display surface. A second value specifying the programmed sample location within the pixel in a second sample pattern table that is associated with a second display surface is also stored and the first attribute is reconstructed based on the geometric surface parameters and the first value.Type: ApplicationFiled: September 11, 2013Publication date: March 12, 2015Applicant: NVIDIA CorporationInventors: Eric B. Lum, Jeffrey Alan Bolz, Rui Manuel Bastos, Andrei Khodakovsky, Christian Johannes Amsinck, Bengt-Olaf Schneider
-
Publication number: 20140118379Abstract: One embodiment of the present invention includes techniques for adaptively sizing cache tiles in a graphics system. A device driver associated with a graphics system sets a cache tile size associated with a cache tile to a first size. The detects a change from a first render target configuration that includes a first set of render targets to a second render target configuration that includes a second set of render targets. The device driver sets the cache tile size to a second size based on the second render target configuration. One advantage of the disclosed approach is that the cache tile size is adaptively sized, resulting in fewer cache tiles for less complex render target configurations. Adaptively sizing cache tiles leads to more efficient processor utilization and reduced power requirements. In addition, a unified L2 cache tile allows dynamic partitioning of cache memory between cache tile data and other data.Type: ApplicationFiled: August 28, 2013Publication date: May 1, 2014Applicant: NVIDIA CORPORATIONInventors: Ziyad S. HAKURA, Rouslan DIMITROV, Emmett M. KILGARIFF, Andrei KHODAKOVSKY
-
Publication number: 20140118363Abstract: A method for managing bind-render-target commands in a tile-based architecture. The method includes receiving a requested set of bound render targets and a draw command. The method also includes, upon receiving the draw command, determining whether a current set of bound render targets includes each of the render targets identified in the requested set. The method further includes, if the current set does not include each render target identified in the requested set, then issuing a flush-tiling-unit-command to a parallel processing subsystem, modifying the current set to include each render target identified in the requested set, and issuing bind-render-target commands identifying the requested set to the tile-based architecture for processing. The method further includes, if the current set of render targets includes each render target identified in the requested set, then not issuing the flush-tiling-unit-command.Type: ApplicationFiled: October 1, 2013Publication date: May 1, 2014Applicant: NVIDIA CORPORATIONInventors: Ziyad S. HAKURA, Jeffrey A. BOLZ, Amanpreet GREWAL, Matthew JOHNSON, Andrei KHODAKOVSKY
-
Publication number: 20140118373Abstract: One embodiment of the present invention sets forth a technique for managing graphics processing resources in a tile-based architecture. The technique includes storing a release packet associated with a graphics processing resource in a buffer and initiating a replay of graphics primitives stored in the buffer and associated with the graphics processing resource. The technique further includes, for each tile included in a plurality of tiles and processed during the replay, reading the release packet and determining whether the tile is a last tile processed during the replay. The technique further includes determining not to transmit the release packet to a screen-space pipeline and continuing to read graphics data stored in the buffer if the tile is not the last tile to be processed during the replay, or transmitting the release packet to the screen-space pipeline if the tile is the last tile to be processed during the replay.Type: ApplicationFiled: October 3, 2013Publication date: May 1, 2014Applicant: NVIDIA CORPORATIONInventors: Ziyad S. HAKURA, Cynthia Ann Edgeworth ALLISON, Dale L. KIRKLAND, Andrei KHODAKOVSKY, Jeffrey A. BOLZ
-
Patent number: 8427474Abstract: One embodiment of the present invention sets forth a method for dynamically load balancing rendering operations across an IGPU and a DGPU. For each frame, the graphics driver configures the IGPU to pre-compute Z-values for a portion of the display surface and to write feedback data to the system memory indicating the time that the IGPU used to process the frame. The graphics driver then configures the DGPU to use the pre-computed Z-values while rendering to the complete display surface and to write feedback data to the system memory indicating the time that the DGPU used to process the frame. The graphics driver uses the feedback data from the IGPU and DGPU in conjunction with the percentage of the display surface that the IGPU Z-rendered for the frame to scale the portion of the display surface that the IGPU Z-renders for one or more subsequent frames. In this fashion, overall processing within the graphics pipeline is optimized across the IGPU and DGPU.Type: GrantFiled: October 3, 2008Date of Patent: April 23, 2013Assignee: Nvidia CorporationInventors: Andrei Khodakovsky, Franck R. Diard
-
Patent number: 8228337Abstract: One embodiment of the present invention sets forth a method for dynamically load balancing rendering operations across an IGPU and a DGPU. For each frame, the graphics driver configures the IGPU to pre-compute Z-values for a portion of the display surface and to write feedback data to the system memory indicating the time that the IGPU used to process the frame. The graphics driver then configures the DGPU to use the pre-computed Z-values while rendering to the complete display surface and to write feedback data to the system memory indicating the time that the DGPU used to process the frame. The graphics driver uses the feedback data from the IGPU and DGPU in conjunction with the percentage of the display surface that the IGPU Z-rendered for the frame to scale the portion of the display surface that the IGPU Z-renders for one or more subsequent frames. In this fashion, overall processing within the graphics pipeline is optimized across the IGPU and DGPU.Type: GrantFiled: October 3, 2008Date of Patent: July 24, 2012Assignee: NVIDIA CorporationInventors: Andrei Khodakovsky, Franck R. Diard