Patents by Inventor Ignacio Llamas

Ignacio Llamas has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

METHOD FOR HANDLING OF OUT-OF-ORDER OPAQUE AND ALPHA RAY/PRIMITIVE INTERSECTIONS

Publication number: 20200051316

Abstract: A hardware-based traversal coprocessor provides acceleration of tree traversal operations searching for intersections between primitives represented in a tree data structure and a ray. The primitives may include opaque and alpha triangles used in generating a virtual scene. The hardware-based traversal coprocessor is configured to determine primitives intersected by the ray, and return intersection information to a streaming multiprocessor for further processing. The hardware-based traversal coprocessor is configured to provide a deterministic result of intersected triangles regardless of the order that the memory subsystem returns triangle range blocks for processing, while opportunistically eliminating alpha intersections that lie further along the length of the ray than closer opaque intersections.

Type: Application

Filed: August 10, 2018

Publication date: February 13, 2020

Inventors: Samuli LAINE, Tero KARRAS, Greg MUTHLER, William Parsons NEWHALL, JR., Ronald Charles BABICH, Ignacio LLAMAS, John BURGESS
ROBUST, EFFICIENT MULTIPROCESSOR-COPROCESSOR INTERFACE

Publication number: 20200050451

Abstract: Systems and methods for an efficient and robust multiprocessor-coprocessor interface that may be used between a streaming multiprocessor and an acceleration coprocessor in a GPU are provided. According to an example implementation, in order to perform an acceleration of a particular operation using the coprocessor, the multiprocessor: issues a series of write instructions to write input data for the operation into coprocessor-accessible storage locations, issues an operation instruction to cause the coprocessor to execute the particular operation; and then issues a series of read instructions to read result data of the operation from coprocessor-accessible storage locations to multiprocessor-accessible storage locations.

Type: Application

Filed: August 10, 2018

Publication date: February 13, 2020

Inventors: Ronald Babich, John BURGESS, Jack CHOQUETTE, Tero KARRAS, Samuli LAINE, Ignacio LLAMAS, Gregory MUTHLER, William Parsons NEWHALL, JR.
QUERY-SPECIFIC BEHAVIORAL MODIFICATION OF TREE TRAVERSAL

Publication number: 20200051315

Abstract: Methods and systems are described in some examples for changing the traversal of an acceleration data structure in a highly dynamic query-specific manner, with each query specifying test parameters, a test opcode and a mapping of test results to actions. In an example ray tracing implementation, traversal of a bounding volume hierarchy by a ray is performed with the default behavior of the traversal being changed in accordance with results of a test performed using the test opcode and test parameters specified in the ray data structure and another test parameter specified in a node of the bounding volume hierarchy. In an example implementation a traversal coprocessor is configured to perform the traversal of the bounding volume hierarchy.

Type: Application

Filed: August 10, 2018

Publication date: February 13, 2020

Inventors: Samuli Laine, Timo AILA, Tero KARRAS, Gregory MUTHLER, William Parsons NEWHALL, JR., Ronald Charles BABICH, JR., Craig KOLB, Ignacio LLAMAS
SHADER BINDING MANAGEMENT IN RAY TRACING

Publication number: 20190311531

Abstract: In various examples, shader bindings may be recorded in a shader binding table that includes shader records. Geometry of a 3D scene may be instantiated using object instances, and each may be associated with a respective set of the shader records using a location identifier of the set of shader records in memory. The set of shader records may represent shader bindings for an object instance under various predefined conditions. One or more of these predefined conditions may be implicit in the way the shader records are arranged in memory (e.g., indexed by ray type, by sub-geometry, etc.). For example, a section selector value (e.g., a section index) may be computed to locate and select a shader record based at least in part on a result of a ray tracing query (e.g., what sub-geometry was hit, what ray type was traced, etc.).

Type: Application

Filed: April 5, 2019

Publication date: October 10, 2019

Inventors: Martin Stich, Ignacio Llamas, Steven Parker
REFLECTION DENOISING IN RAY-TRACING APPLICATIONS

Publication number: 20190287294

Abstract: Disclosed approaches may leverage the actual spatial and reflective properties of a virtual environment—such as the size, shape, and orientation of a bidirectional reflectance distribution function (BRDF) lobe of a light path and its position relative to a reflection surface, a virtual screen, and a virtual camera—to produce, for a pixel, an anisotropic kernel filter having dimensions and weights that accurately reflect the spatial characteristics of the virtual environment as well as the reflective properties of the surface. In order to accomplish this, geometry may be computed that corresponds to a projection of a reflection of the BRDF lobe below the surface along a view vector to the pixel. Using this approach, the dimensions of the anisotropic filter kernel may correspond to the BRDF lobe to accurately reflect the spatial characteristics of the virtual environment as well as the reflective properties of the surface.

Type: Application

Filed: March 15, 2019

Publication date: September 19, 2019

Inventors: Shiqiu Liu, Christopher Ryan Wyman, Jon Hasselgren, Jacob Munkberg, Ignacio Llamas
Low overhead thread synchronization using hardware-accelerated bounded circular queues

Patent number: 10002031

Abstract: A first thread is placed into a blocked state by causing the thread to perform a blocking pop operation on a hardware-accelerated, single-entry queue. When a synchronization event completes, a second thread may release the first thread from the blocked state pushing a data value onto the hardware accelerated, single-entry queue. The push operation satisfies the blocking pop operation, and the first thread is released.

Type: Grant

Filed: May 8, 2013

Date of Patent: June 19, 2018

Assignee: NVIDIA CORPORATION

Inventors: Ignacio Llamas, James David Balfour
System, method, and computer program product for a two-phase queue

Patent number: 9928104

Abstract: A system, method, and computer program product are provided for accessing a queue. The method includes receiving a first request to reserve a data record entry in a queue, updating a queue state block based on the first request, and returning a response to the request. A second request is received to commit the data record entry and the queue state block is updated based on the second request.

Type: Grant

Filed: June 19, 2013

Date of Patent: March 27, 2018

Assignee: NVIDIA Corporation

Inventors: William J. Dally, James David Balfour, Ignacio Llamas Ubieto
Reduction of graphical processing through coverage testing

Patent number: 9760968

Abstract: A method for using a graphics processor by an electronic device for subdividing an input image into multiple sub-regions. For each particular sub-region, a data structure is created that identifies one or more primitives that are visible in each quad of the particular sub-region. Existing coverage of one or more quads is erased based on graphics state (GState) information resulting in surviving coverage for one or more quads.

Type: Grant

Filed: October 31, 2014

Date of Patent: September 12, 2017

Assignee: Samsung Electronics Co., Ltd.

Inventors: Derek Lentz, Michael Shebanow, Ignacio Llamas
Application programming interface to enable the construction of pipeline parallel programs

Patent number: 9697044

Abstract: An application programming interface (API) provides various software constructs that allow a developer to assemble a processing pipeline having arbitrary structure and complexity. Once assembled, the processing pipeline is configured to include a set of interconnected pipestages. Those pipestages are associated with one or more different CTAs that may execute in parallel with one another on a parallel processing unit. The developer specifies the configuration of the pipestages, including the configuration of the different CTAs across all pipestages, as well as the different processing operations performed by each different CTA.

Type: Grant

Filed: May 21, 2013

Date of Patent: July 4, 2017

Assignee: NVIDIA Corporation

Inventor: Ignacio Llamas
Technique for Computational Nested Parallelism

Publication number: 20170083373

Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.

Type: Application

Filed: December 2, 2016

Publication date: March 23, 2017

Inventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, Christopher Lamb
Technique for computational nested parallelism

Patent number: 9513975

Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.

Type: Grant

Filed: May 2, 2012

Date of Patent: December 6, 2016

Assignee: NVIDIA Corporation

Inventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, Jr., Christopher Lamb
Work-queue-based graphics processing unit work creation

Patent number: 9489245

Abstract: One embodiment of the present invention enables threads executing on a processor to locally generate and execute work within that processor by way of work queues and command blocks. A device driver, as an initialization procedure for establishing memory objects that enable the threads to locally generate and execute work, generates a work queue, and sets a GP_GET pointer of the work queue to the first entry in the work queue. The device driver also, during the initialization procedure, sets a GP_PUT pointer of the work queue to the last free entry included in the work queue, thereby establishing a range of entries in the work queue into which new work generated by the threads can be loaded and subsequently executed by the processor. The threads then populate command blocks with generated work and point entries in the work queue to the command blocks to effect processor execution of the work stored in the command blocks.

Type: Grant

Filed: October 26, 2012

Date of Patent: November 8, 2016

Assignee: NVIDIA Corporation

Inventors: Ignacio Llamas, Craig Ross Duttweiler, Jeffrey A. Bolz, Daniel Elliot Wexler
Variable fragment shading with surface recasting

Patent number: 9355483

Abstract: A system, method, and computer program product are provided for shading primitive fragments. A target buffer may be recast when shaded samples that are covered by a primitive fragment are generated at a first shading rate using a first sampling mode, the shaded samples are stored in the target buffer that is associated with the first sampling mode and the first shading rate, a second sampling mode is determined, and the target buffer is associated with the second sampling mode. A sampling mode and/or shading rate may be changed for a primitive. A primitive fragment that is associated with a first sampling mode and a first shading rate is received and a second sampling mode is determined for the primitive fragment. Shaded samples corresponding to the primitive fragment are generated, at a second shading rate, using the second sampling mode and the shaded samples are stored in a target buffer.

Type: Grant

Filed: July 19, 2013

Date of Patent: May 31, 2016

Assignee: NVIDIA Corporation

Inventors: Eric B. Lum, Rouslan L. Dimitrov, Ignacio Llamas Ubieto, Patrick James Neill, Yury Uralsky, Albert Meixner
API for launching work on a processor

Patent number: 9268601

Abstract: One embodiment of the present invention sets forth a technique for launching work on a processor. The method includes the steps of initializing a first state object within a memory region accessible to a program executing on the processor, populating the first state object with data associated with a first workload that is generated by the program, and triggering the processing of the first workload on the processor according to the data within the first state object.

Type: Grant

Filed: March 31, 2011

Date of Patent: February 23, 2016

Assignee: NVIDIA Corporation

Inventors: Timothy Paul Lottes Farrar, Ignacio Llamas, Daniel Elliot Wexler, Craig Ross Duttweiler
Work-queue-based graphics processing unit work creation

Patent number: 9135081

Abstract: One embodiment of the present invention enables threads executing on a processor to locally generate and execute work within that processor by way of work queues and command blocks. A device driver, as an initialization procedure for establishing memory objects that enable the threads to locally generate and execute work, generates a work queue, and sets a GP_GET pointer of the work queue to the first entry in the work queue. The device driver also, during the initialization procedure, sets a GP_PUT pointer of the work queue to the last free entry included in the work queue, thereby establishing a range of entries in the work queue into which new work generated by the threads can be loaded and subsequently executed by the processor. The threads then populate command blocks with generated work and point entries in the work queue to the command blocks to effect processor execution of the work stored in the command blocks.

Type: Grant

Filed: October 26, 2012

Date of Patent: September 15, 2015

Assignee: NVIDIA Corporation

Inventors: Ignacio Llamas, Craig Ross Duttweiler, Jeffrey A. Bolz, Daniel Elliot Wexler
VARIABLE FRAGMENT SHADING WITH SURFACE RECASTING

Publication number: 20150022537

Abstract: A system, method, and computer program product are provided for shading primitive fragments. A target buffer may be recast when shaded samples that are covered by a primitive fragment are generated at a first shading rate using a first sampling mode, the shaded samples are stored in the target buffer that is associated with the first sampling mode and the first shading rate, a second sampling mode is determined, and the target buffer is associated with the second sampling mode. A sampling mode and/or shading rate may be changed for a primitive. A primitive fragment that is associated with a first sampling mode and a first shading rate is received and a second sampling mode is determined for the primitive fragment. Shaded samples corresponding to the primitive fragment are generated, at a second shading rate, using the second sampling mode and the shaded samples are stored in a target buffer.

Type: Application

Filed: July 19, 2013

Publication date: January 22, 2015

Inventors: Eric B. Lum, Rouslan L. Dimitrov, Ignacio Llamas, Patrick James Neill, Yury Uralsky, Albert Meixner
Low latency concurrent computation

Patent number: 8928677

Abstract: One embodiment of the present invention sets forth a technique for performing low latency computation on a parallel processing subsystem. A low latency functional node is exposed to an operating system. The low latency functional node and a generic functional node are configured to target the same underlying processor resource within the parallel processing subsystem. The operating system stores low latency tasks generated by a user application within a low latency command buffer associated with the low latency functional node. The parallel processing subsystem advantageously executes tasks from the low latency command buffer prior to completing execution of tasks in the generic command buffer, thereby reducing completion latency for the low latency tasks.

Type: Grant

Filed: January 24, 2012

Date of Patent: January 6, 2015

Assignee: NVIDIA Corporation

Inventors: Daniel Elliot Wexler, Jeffrey A. Bolz, Jesse David Hall, Philip Alexander Cuadra, Naveen Leekha, Ignacio Llamas
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR A TWO-PHASE QUEUE

Publication number: 20140380002

Abstract: A system, method, and computer program product are provided for accessing a queue. The method includes receiving a first request to reserve a data record entry in a queue, updating a queue state block based on the first request, and returning a response to the request. A second request is received to commit the data record entry and the queue state block is updated based on the second request.

Type: Application

Filed: June 19, 2013

Publication date: December 25, 2014

Inventors: William J. Dally, James David Balfour, Ignacio Llamas Ubieto
APPLICATION PROGRAMMING INTERFACE TO ENABLE THE CONSTRUCTION OF PIPELINE PARALLEL PROGRAMS

Publication number: 20140351826

Abstract: An application programming interface (API) provides various software constructs that allow a developer to assemble a processing pipeline having arbitrary structure and complexity. Once assembled, the processing pipeline is configured to include a set of interconnected pipestages. Those pipestages are associated with one or more different CTAs that may execute in parallel with one another on a parallel processing unit. The developer specifies the configuration of the pipestages, including the configuration of the different CTAs across all pipestages, as well as the different processing operations performed by each different CTA.

Type: Application

Filed: May 21, 2013

Publication date: November 27, 2014

Applicant: NVIDIA CORPORATION

Inventor: Ignacio LLAMAS
APPLICATION PROGRAMMING INTERFACE TO ENABLE THE CONSTRUCTION OF PIPELINE PARALLEL PROGRAMS

Publication number: 20140351827

Abstract: An application programming interface (API) provides various software constructs that allow a developer to assemble a processing pipeline having arbitrary structure and complexity. Once assembled, the processing pipeline is configured to include a set of interconnected pipestages. Those pipestages are associated with one or more different CTAs that may execute in parallel with one another on a parallel processing unit. The developer specifies the configuration of the pipestages, including the configuration of the different CTAs across all pipestages, as well as the different processing operations performed by each different CTA.

Type: Application

Filed: May 21, 2013

Publication date: November 27, 2014

Applicant: NVIDIA CORPORATION

Inventor: Ignacio LLAMAS

prev 1 2 3 4 next