Patents by Inventor Alexander L. Minkin

Alexander L. Minkin has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240069736
    Abstract: Various embodiments include techniques for performing self-synchronizing remote memory operations in a multiprocessor computing system. During a remote memory operation in the multiprocessor computing system, a source processing unit transmits multiple segments of data to a destination processing. For each segment of data, the source processing unit transmits a remote memory operation to the destination processing unit that includes associated metadata that identifies the memory location of a corresponding synchronization object. The remote memory operation along with the metadata is transmitted as a single unit to the destination processing unit. The destination processing unit splits the operation into the remote memory operation and the memory synchronization operation. As a result, the source processing unit avoids the need to perform a separate memory synchronization operation, thereby reducing inter-processor communications and increasing performance of remote memory operations.
    Type: Application
    Filed: August 31, 2022
    Publication date: February 29, 2024
    Inventors: Srinivas Santosh Kumar MADUGULA, Olivier GIROUX, Wishwesh Anil GANDHI, Michael Allen PARKER, Raghuram L, Ivan TANASIC, Manan PATEL, Mark HUMMEL, Alexander L. MINKIN
  • Publication number: 20230289190
    Abstract: This specification describes a programmatic multicast technique enabling one thread (for example, in a cooperative group array (CGA) on a GPU) to request data on behalf of one or more other threads (for example, executing on respective processor cores of the GPU). The multicast is supported by tracking circuitry that interfaces between multicast requests received from processor cores and the available memory. The multicast is designed to reduce cache (for example, layer 2 cache) bandwidth utilization enabling strong scaling and smaller tile sizes.
    Type: Application
    Filed: March 10, 2022
    Publication date: September 14, 2023
    Inventors: Apoorv PARLE, Ronny KRASHINSKY, John EDMONDSON, Jack CHOQUETTE, Shirish GADRE, Steve HEINRICH, Manan PATEL, Prakash Bangalore PRABHAKAR, JR., Ravi MANYAM, Wish GANDHI, Lacky SHAH, Alexander L. Minkin
  • Publication number: 20230289292
    Abstract: A parallel processing unit comprises a plurality of processors each being coupled to a memory access hardware circuitry. Each memory access hardware circuitry is configured to receive, from the coupled processor, a memory access request specifying a coordinate of a multidimensional data structure, wherein the memory access hardware circuit is one of a plurality of memory access circuitry each coupled to a respective one of the processors; and, in response to the memory access request, translate the coordinate of the multidimensional data structure into plural memory addresses for the multidimensional data structure and using the plural memory addresses, asynchronously transfer at least a portion of the multidimensional data structure for processing by at least the coupled processor. The memory locations may be in the shared memory of the coupled processor and/or an external memory.
    Type: Application
    Filed: March 10, 2022
    Publication date: September 14, 2023
    Inventors: Alexander L. Minkin, Alan Kaatz, Olivier Giroux, Jack Choquette, Shirish Gadre, Manan Patel, John Tran, Ronny Krashinsky, Jeff Schottmiller
  • Publication number: 20230289304
    Abstract: A parallel processing unit comprises a plurality of processors each being coupled to a memory access hardware circuitry. Each memory access hardware circuitry is configured to receive, from the coupled processor, a memory access request specifying a coordinate of a multidimensional data structure, wherein the memory access hardware circuit is one of a plurality of memory access circuitry each coupled to a respective one of the processors; and, in response to the memory access request, translate the coordinate of the multidimensional data structure into plural memory addresses for the multidimensional data structure and using the plural memory addresses, asynchronously transfer at least a portion of the multidimensional data structure for processing by at least the coupled processor. The memory locations may be in the shared memory of the coupled processor and/or an external memory.
    Type: Application
    Filed: March 10, 2022
    Publication date: September 14, 2023
    Inventors: Alexander L. Minkin, Alan Kaatz, Oliver Giroux, Jack Choquette, Shirish Gadre, Manan Patel, John Tran, Ronny Krashinsky, Jeff Schottmiller
  • Patent number: 9952977
    Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method including receiving an instruction that includes a cache operations modifier that identifies a level of the parallel cache hierarchy in which to cache data associated with the instruction; and implementing a cache replacement policy based on the cache operations modifier.
    Type: Grant
    Filed: September 24, 2010
    Date of Patent: April 24, 2018
    Assignee: NVIDIA CORPORATION
    Inventors: Steven James Heinrich, Alexander L. Minkin, Brett W. Coon, Rajeshwaran Selvanesan, Robert Steven Glanville, Charles McCarver, Anjana Rajendran, Stewart Glenn Carlton, John R. Nickolls, Brian Fahs
  • Patent number: 9595075
    Abstract: Approaches are disclosed for performing memory access operations in a texture processing pipeline having a first portion configured to process texture memory access operations and a second portion configured to process non-texture memory access operations. A texture unit receives a memory access request. The texture unit determines whether the memory access request includes a texture memory access operation. If the memory access request includes a texture memory access operation, then the texture unit processes the memory access request via at least the first portion of the texture processing pipeline, otherwise, the texture unit processes the memory access request via at least the second portion of the texture processing pipeline. One advantage of the disclosed approach is that the same processing and cache memory may be used for both texture operations and load/store operations to various other address spaces, leading to reduced surface area and power consumption.
    Type: Grant
    Filed: September 26, 2013
    Date of Patent: March 14, 2017
    Assignee: NVIDIA Corporation
    Inventors: Steven J. Heinrich, Eric T. Anderson, Jeffrey A. Bolz, Jonathan Dunaisky, Ramesh Jandhyala, Joel McCormack, Alexander L. Minkin, Bryon S. Nordquist, Poornachandra Rao
  • Patent number: 9286256
    Abstract: The invention sets forth an L1 cache architecture that includes a crossbar unit configured to transmit data associated with both read data requests and write data requests. Data associated with read data requests is retrieved from a cache memory and transmitted to the client subsystems. Similarly, data associated with write data requests is transmitted from the client subsystems to the cache memory. To allow for the transmission of both read and write data on the crossbar unit, an arbiter is configured to schedule the crossbar unit transmissions as well and arbitrate between data requests received from the client subsystems.
    Type: Grant
    Filed: September 28, 2010
    Date of Patent: March 15, 2016
    Assignee: NVIDIA Corporation
    Inventors: Alexander L. Minkin, Steven J. Heinrich, Rajeshwaran Selvanesan, Stewart Glenn Carlton, John R. Nickolls
  • Patent number: 9013498
    Abstract: A system and method for tracking and reporting texture map levels of detail that are computed during graphics processing allows for efficient management of texture map storage. Minimum and/or maximum pre-clamped texture map levels of detail values are tracked by a graphics processor and an array stored in memory is updated to report the minimum and/or maximum values for use by an application program. The minimum and/or maximum values may be used to determine the active set of texture map levels of detail that is loaded into graphics memory.
    Type: Grant
    Filed: December 19, 2008
    Date of Patent: April 21, 2015
    Assignee: NVIDIA Corporation
    Inventors: John S. Montrym, Andrew J. Tao, Henry P. Moreton, Emmett M. Kilgariff, Cass W. Everitt, Alexander L. Minkin, Eric Anderson, Yan Yan Tang, Jerome F. Duluk, Jr.
  • Publication number: 20150084975
    Abstract: Approaches are disclosed for performing memory access operations in a texture processing pipeline having a first portion configured to process texture memory access operations and a second portion configured to process non-texture memory access operations. A texture unit receives a memory access request. The texture unit determines whether the memory access request includes a texture memory access operation. If the memory access request includes a texture memory access operation, then the texture unit processes the memory access request via at least the first portion of the texture processing pipeline, otherwise, the texture unit processes the memory access request via at least the second portion of the texture processing pipeline. One advantage of the disclosed approach is that the same processing and cache memory may be used for both texture operations and load/store operations to various other address spaces, leading to reduced surface area and power consumption.
    Type: Application
    Filed: September 26, 2013
    Publication date: March 26, 2015
    Applicant: NVIDIA CORPORATION
    Inventors: Steven J. HEINRICH, Eric T. ANDERSON, Jeffrey A. BOLZ, Jonathan DUNAISKY, Ramesh JANDHYALA, Joel MCCORMACK, Alexander L. MINKIN, Bryon S. NORDQUIST, Poornachandra RAO
  • Patent number: 8817035
    Abstract: Circuits, methods, and apparatus that perform a context switch quickly while not wasting a significant amount of in-progress work. A texture pipeline includes a cutoff point or stage. After receipt of a context switch instruction, texture requests and state updates above the cutoff point are stored in a memory, while those below the cutoff point are processed before the context switch is completed. After this processing is complete, global states in the texture pipeline are stored in the memory. A previous context may then be restored by reading its texture requests and global states from the memory and loading them into the texture pipeline. The location of the cutoff point can be a point in the pipeline where a texture request can no longer result in a page fault in the memory.
    Type: Grant
    Filed: February 25, 2013
    Date of Patent: August 26, 2014
    Assignee: NVIDIA Corporation
    Inventor: Alexander L. Minkin
  • Patent number: 8595425
    Abstract: One embodiment of the present invention sets forth a technique for providing a L1 cache that is a central storage resource. The L1 cache services multiple clients with diverse latency and bandwidth requirements. The L1 cache may be reconfigured to create multiple storage spaces enabling the L1 cache may replace dedicated buffers, caches, and FIFOs in previous architectures. A “direct mapped” storage region that is configured within the L1 cache may replace dedicated buffers, FIFOs, and interface paths, allowing clients of the L1 cache to exchange attribute and primitive data. The direct mapped storage region may used as a global register file. A “local and global cache” storage region configured within the L1 cache may be used to support load/store memory requests to multiple spaces. These spaces include global, local, and call-return stack (CRS) memory.
    Type: Grant
    Filed: September 25, 2009
    Date of Patent: November 26, 2013
    Assignee: NVIDIA Corporation
    Inventors: Alexander L. Minkin, Steven James Heinrich, RaJeshwaran Selvanesan, Brett W. Coon, Charles McCarver, Anjana Rajendran, Stewart G. Carlton
  • Patent number: 8335892
    Abstract: One embodiment of the present invention sets forth a technique for arbitrating requests received by an L1 cache from multiple clients. The L1 cache outputs bubble requests to a first one of the multiple clients that cause the first one of the multiple clients to insert bubbles into the request stream, where a bubble is the absence of a request. The bubbles allow the L1 cache to grant access to another one of the multiple clients without stalling the first one of the multiple clients. The L1 cache services multiple clients with diverse latency and bandwidth requirements and may be reconfigured to provide memory spaces for clients executing multiple parallel threads, where the memory spaces each have a different scope.
    Type: Grant
    Filed: December 30, 2009
    Date of Patent: December 18, 2012
    Assignee: NVIDIA Corporation
    Inventors: Alexander L. Minkin, Steven J. Heinrich, Rajeshwaran Selvanesan, Charles McCarver, Stewart Glenn Carlton, Anjana Rajendran
  • Patent number: 8266382
    Abstract: One embodiment of the present invention sets forth a technique for arbitrating requests received from one of the multiple clients of an L1 cache and for providing hints to the client to assist in arbitration. The L1 cache services multiple clients with diverse latency and bandwidth requirements and may be reconfigured to provide memory spaces for clients executing multiple parallel threads, where the memory spaces each have a different scope.
    Type: Grant
    Filed: December 30, 2009
    Date of Patent: September 11, 2012
    Assignee: NVIDIA Corporation
    Inventors: Alexander L. Minkin, Steven J. Heinrich, Rajeshwaran Selvanesan, Charles McCarver, Stewart Glenn Carlton, Anjana Rajendran, Yan Yan Tang
  • Patent number: 8266383
    Abstract: One embodiment of the present invention sets forth a technique for processing cache misses resulting from a request received from one of the multiple clients of an L1 cache. The L1 cache services multiple clients with diverse latency and bandwidth requirements, including at least one client whose requests cannot be stalled. The L1 cache includes storage to buffer pending requests for caches misses. When an entry is available to store a pending request, a request causing a cache miss is accepted. When the data for a read request becomes available, the cache instructs the client to resubmit the read request to receive the data. When an entry is not available to store a pending request, a request causing a cache miss is deferred and the cache provides the client with status information that is used to determine when the request should be resubmitted.
    Type: Grant
    Filed: December 30, 2009
    Date of Patent: September 11, 2012
    Assignee: NVIDIA Corporation
    Inventors: Alexander L. Minkin, Steven J. Heinrich, Rajeshwaran Selvanesan, Charles McCarver, Stewart Glenn Carlton, Ming Y. Siu, Yan Yan Tang, Robert J. Stoll
  • Patent number: 8217954
    Abstract: Circuits, methods, and apparatus that provide texture caches and related circuits that store and retrieve texels in an efficient manner. One such texture circuit can provide a configurable number of texel quads for a configurable number of pixels. For bilinear filtering, texels for a comparatively greater number of pixels can be retrieved. For trilinear filtering, texels in a first LOD are retrieved for a number of pixels during a first clock cycle, during a second clock cycle, texels in a second LOD are retrieved. When aniso filtering is needed, a greater number of texels can be retrieved for a comparatively lower number of pixels.
    Type: Grant
    Filed: August 15, 2011
    Date of Patent: July 10, 2012
    Assignee: NVIDIA Corporation
    Inventor: Alexander L. Minkin
  • Patent number: 8212835
    Abstract: One embodiment of the present invention sets forth a technique for transitioning from bilinear sampling to filter-4 sampling, while avoiding the visual artifacts along the boundary between the two different types of filters. The technique may be implemented using a linear transition function or an arbitrary transition function stored in a lookup table. The transition to filter-4 sampling occurs when the view of a textured object includes both minified and magnified levels of texture detail. Using this technique, high image quality is maintained for texture mapped images that include both highly minified pixels as well as highly magnified pixels, without suffering the performance penalty associated with using a filtering operation such as filter-4 sampling across all pixels.
    Type: Grant
    Filed: December 14, 2006
    Date of Patent: July 3, 2012
    Assignee: NVIDIA Corporation
    Inventors: Christopher J. Migdal, Alexander L. Minkin, Walter E. Donovan
  • Publication number: 20110292065
    Abstract: Circuits, methods, and apparatus that provide texture caches and related circuits that store and retrieve texels in an efficient manner. One such texture circuit can provide a configurable number of texel quads for a configurable number of pixels. For bilinear filtering, texels for a comparatively greater number of pixels can be retrieved. For trilinear filtering, texels in a first LOD are retrieved for a number of pixels during a first clock cycle, during a second clock cycle, texels in a second LOD are retrieved. When aniso filtering is needed, a greater number of texels can be retrieved for a comparatively lower number of pixels.
    Type: Application
    Filed: August 15, 2011
    Publication date: December 1, 2011
    Applicant: NVIDIA Corporation
    Inventor: Alexander L. Minkin
  • Patent number: 7999821
    Abstract: Circuits, methods, and apparatus that provide texture caches and related circuits that store and retrieve texels in an efficient manner. One such texture circuit can provide a configurable number of texel quads for a configurable number of pixels. For bilinear filtering, texels for a comparatively greater number of pixels can be retrieved. For trilinear filtering, texels in a first LOD are retrieved for a number of pixels during a first clock cycle, during a second clock cycle, texels in a second LOD are retrieved. When aniso filtering is needed, a greater number of texels can be retrieved for a comparatively lower number of pixels.
    Type: Grant
    Filed: December 19, 2007
    Date of Patent: August 16, 2011
    Assignee: NVIDIA Corporation
    Inventor: Alexander L. Minkin
  • Patent number: 7948498
    Abstract: Circuits, methods, and apparatus that store a large number of texture states in an efficient manner. A level-one texture cache includes cache lines that are distributed throughout a texture pipeline, where each cache line stores a texture state. The cache lines can be updated by retrieving data from a second-level texture state cache, which in turn is updated from a frame buffer or graphics memory. The second-level texture state cache can prefetch texture states using a list of textures that are needed for a shader program or program portion.
    Type: Grant
    Filed: October 13, 2006
    Date of Patent: May 24, 2011
    Assignee: NVIDIA Corporation
    Inventor: Alexander L. Minkin
  • Patent number: 7948495
    Abstract: Systems and methods used for binding texture state stored in independent structures may be used by more than one graphics applications programming interface (API). A texture header portion of the texture state defines texture data characteristics and is stored in a first structure. A texture sampler portion of the texture state specifies texture processing attributes and is stored in a second structure. A single unified structure is emulated for use by APIs that store the texture state in a single structure. Therefore, a graphics processor may support more than one graphics API for processing texture data.
    Type: Grant
    Filed: February 2, 2006
    Date of Patent: May 24, 2011
    Assignee: NVIDIA Corporation
    Inventors: Bryon S. Nordquist, Alexander L. Minkin