Patents by Inventor Douglas J. Hahn
Douglas J. Hahn has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10152329Abstract: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced.Type: GrantFiled: February 9, 2012Date of Patent: December 11, 2018Assignee: NVIDIA CORPORATIONInventors: Michael Fetterman, Stewart Glenn Carlton, Jack Hilaire Choquette, Shirish Gadre, Olivier Giroux, Douglas J. Hahn, Steven James Heinrich, Eric Lyell Hill, Charles McCarver, Omkar Paranjape, Anjana Rajendran, Rajeshwaran Selvanesan
-
Patent number: 10007527Abstract: One embodiment of the present invention sets forth a technique for processing load instructions for parallel threads of a thread group when a sub-set of the parallel threads request the same memory address. The load/store unit determines if the memory addresses for each sub-set of parallel threads match based on one or more uniform patterns. When a match is achieved for at least one of the uniform patterns, the load/store unit transmits a read request to retrieve data for the sub-set of parallel threads. The number of read requests transmitted is reduced compared with performing a separate read request for each thread in the sub-set. A variety of uniform patterns may be defined based on common access patterns present in program instructions. A variety of uniform patterns may also be defined based on interconnect constraints between the load/store unit and the memory when a full crossbar interconnect is not available.Type: GrantFiled: March 5, 2012Date of Patent: June 26, 2018Assignee: NVIDIA CORPORATIONInventors: Michael Fetterman, Stewart Glenn Carlton, Douglas J. Hahn, Rajeshwaran Selvanesan, Shirish Gadre, Steven James Heinrich
-
Patent number: 9817668Abstract: One embodiment of the present invention sets forth an approach for executing replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back.Type: GrantFiled: December 16, 2011Date of Patent: November 14, 2017Assignee: NVIDIA CorporationInventors: Michael Fetterman, Jack Hilaire Choquette, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Stewart Glenn Carlton, Rajeshwaran Selvanesan, Douglas J. Hahn, Steven James Heinrich
-
Patent number: 9626191Abstract: One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers.Type: GrantFiled: December 22, 2011Date of Patent: April 18, 2017Assignee: NVIDIA CorporationInventors: Jack Hilaire Choquette, Michael Fetterman, Shirish Gadre, Xiaogang Qiu, Omkar Paranjape, Anjana Rajendran, Stewart Glenn Carlton, Eric Lyell Hill, Rajeshwaran Selvanesan, Douglas J. Hahn
-
Patent number: 9262174Abstract: One embodiment sets forth a technique for dynamically mapping addresses to banks of a multi-bank memory based on a bank mode. Application programs may be configured to perform read and write a memory accessing different numbers of bits per bank, e.g., 32-bits per bank, 64-bits per bank, or 128-bits per bank. On each clock cycle an access request may be received from one of the application programs and per processing thread addresses of the access request are dynamically mapped based on the bank mode to produce a set of bank addresses. The bank addresses are then used to access the multi-bank memory. Allowing different bank mappings enables each application program to avoid bank conflicts when the memory is accesses compared with using a single bank mapping for all accesses.Type: GrantFiled: April 5, 2012Date of Patent: February 16, 2016Assignee: NVIDIA CorporationInventors: Michael Fetterman, Stewart Glenn Carlton, Douglas J. Hahn, Rajeshwaran Selvanesan, Shirish Gadre, Steven James Heinrich
-
Publication number: 20130268715Abstract: One embodiment sets forth a technique for dynamically mapping addresses to banks of a multi-bank memory based on a bank mode. Application programs may be configured to perform read and write a memory accessing different numbers of bits per bank, e.g., 32-bits per bank, 64-bits per bank, or 128-bits per bank. On each clock cycle an access request may be received from one of the application programs and per processing thread addresses of the access request are dynamically mapped based on the bank mode to produce a set of bank addresses. The bank addresses are then used to access the multi-bank memory. Allowing different bank mappings enables each application program to avoid bank conflicts when the memory is accesses compared with using a single bank mapping for all accesses.Type: ApplicationFiled: April 5, 2012Publication date: October 10, 2013Inventors: Michael FETTERMAN, Stewart Glenn Carlton, Douglas J. Hahn, Rajeshwaran Selvanesan, Shirish Gadre, Steven James Heinrich
-
Publication number: 20130232322Abstract: One embodiment of the present invention sets forth a technique for processing load instructions for parallel threads of a thread group when a sub-set of the parallel threads request the same memory address. The load/store unit determines if the memory addresses for each sub-set of parallel threads match based on one or more uniform patterns. When a match is achieved for at least one of the uniform patterns, the load/store unit transmits a read request to retrieve data for the sub-set of parallel threads. The number of read requests transmitted is reduced compared with performing a separate read request for each thread in the sub-set. A variety of uniform patterns may be defined based on common access patterns present in program instructions. A variety of uniform patterns may also be defined based on interconnect constraints between the load/store unit and the memory when a full crossbar interconnect is not available.Type: ApplicationFiled: March 5, 2012Publication date: September 5, 2013Inventors: Michael FETTERMAN, Stewart Glenn Carlton, Douglas J. Hahn, Rajeshwaran Selvanesan, Shirish Gadre, Steven James Heinrich
-
Publication number: 20130212364Abstract: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced.Type: ApplicationFiled: February 9, 2012Publication date: August 15, 2013Inventors: Michael FETTERMAN, Stewart Glenn Carlton, Jack Hilaire Choquette, Shirish Gadre, Olivier Giroux, Douglas J. Hahn, Steven James Heinrich, Eric Lyell Hill, Charles McCarver, Omkar Paranjape, Anjana Rajendran, Rajeshwaran Selvanesan
-
Publication number: 20130166877Abstract: One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers.Type: ApplicationFiled: December 22, 2011Publication date: June 27, 2013Inventors: Jack Hilaire CHOQUETTE, Michael FETTERMAN, Shirish GADRE, Xiaogang QIU, Omkar PARANJAPE, Anjana RAJENDRAN, Stewart Glenn CARLTON, Eric Lyell HILL, Rajeshwaran SELVANESAN, Douglas J. HAHN
-
Publication number: 20130159684Abstract: One embodiment of the present invention sets forth an optimized way to execute replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back. Advantageously, divergent operations requiring two or more replay operations operate with reduced latency. Where memory access operations require transfer of more than two cache lines to service all threads, the number of clock cycles required to complete all replay operations is reduced.Type: ApplicationFiled: December 16, 2011Publication date: June 20, 2013Inventors: Michael Fetterman, Jack Hilaire Choquette, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Stewart glenn Carlton, Rajeshwaran Selvanesan, Douglas J. Hahn, Steven James Heinrich
-
Patent number: 7830386Abstract: Systems and methods for using a graphics processor as a coprocessor to a general purpose processor to perform register transfer level simulations may improve simulation performance compared with using only the general purpose processor. The internal state of memory elements of an RTL model of an electronic circuit are stored as surface data for each simulation timestep. Transform functions are used to determine a next state based on the current state and simulation inputs. The transfer functions are expressed as a graphics program, such as a shader or vertex program that may be executed by a programmable graphics processor.Type: GrantFiled: June 17, 2005Date of Patent: November 9, 2010Assignee: NVIDIA CorporationInventor: Douglas J. Hahn
-
Patent number: 7486290Abstract: A graphical shader and a method of distributing graphical data to shader pipelines in a graphical shader are disclosed. In accordance with the method, a shader pipeline input delay is set. Further, a group of the graphical data is distributed to a shader pipeline of the graphical shader to be processed. The method includes waiting for the shader pipeline input delay to elapse. After the shader pipeline input delay has elapsed, another group of the graphical data is distributed to another shader pipeline of the graphical shader to be processed. In another embodiment, a graphical shader includes a plurality of shader pipelines for processing graphical data. Further, the graphical shader includes a shader distributor for distributing a group of the graphical data to one of the shader pipelines and for distributing another group of the graphical data to another one of the shader pipelines after a shader pipeline input delay has elapsed.Type: GrantFiled: June 10, 2005Date of Patent: February 3, 2009Assignee: NVIDIA CorporationInventors: Emmett M. Kilgariff, Rui M. Bastos, Wei-Chao Chen, Douglas J. Hahn
-
Patent number: 5751951Abstract: A packet based data transmission system includes a flexible optimized nonocking transmit interface that incorporates optimized buffer modes, dynamic and static chaining, streaming and the utilization of small packet formats. Static chaining refers to connecting together the linked list for successive packets for the same transmit channel or virtual channel. Dynamic chaining refers to means by which the network interface performs this chaining automatically, thereby solving a blocking problem. On the transmit side, streaming refers to initiating the transmission of packet data before the entire packet data has been presented to the interface. This, in turn, permits more rapid recycling of the buffer space. On the receive side, streaming refers to initiating the processing of packet data before the entire packet has been received.Type: GrantFiled: October 30, 1995Date of Patent: May 12, 1998Assignee: Mitsubishi Electric Information Technology Center America, Inc.Inventors: Randy B. Osborne, John H. Howard, Ross T. Casley, Douglas J. Hahn