Patents by Inventor Amit Sabne

Amit Sabne has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11068247
    Abstract: Algorithms, examples, and related technology for automatic vectorization of a particular class of loops is described. The loops, denoted “CMMSR loops”, operate to find an extremum and also utilize an index denoting the position of the extremum in an array or other multi-element input. CMMSR loops are identified in a language translator by matching a specified template or having a specified set of parsing results, or both. Generated vectorization code includes, for example, code to compute candidates for the extremum, code to select the same instance of the extremum as a scalar execution when the input contains multiple instances, and wind-down code to compute an index expression based on the selected instance of the extremum. Vectorizations may execute on SIMD hardware or other vector processors.
    Type: Grant
    Filed: February 6, 2018
    Date of Patent: July 20, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Amit Sabne, James J Radigan
  • Publication number: 20190243625
    Abstract: Algorithms, examples, and related technology for automatic vectorization of a particular class of loops is described. The loops, denoted “CMMSR loops”, operate to find an extremum and also utilize an index denoting the position of the extremum in an array or other multi-element input. CMMSR loops are identified in a language translator by matching a specified template or having a specified set of parsing results, or both. Generated vectorization code includes, for example, code to compute candidates for the extremum, code to select the same instance of the extremum as a scalar execution when the input contains multiple instances, and wind-down code to compute an index expression based on the selected instance of the extremum. Vectorizations may execute on SIMD hardware or other vector processors.
    Type: Application
    Filed: February 6, 2018
    Publication date: August 8, 2019
    Inventors: Amit SABNE, James J. RADIGAN
  • Patent number: 9747107
    Abstract: A system and method for compiling or runtime executing a fork-join data parallel program with function calls. In one embodiment, the system includes: (1) a partitioner operable to partition groups into a master group and at least one worker group and (2) a thread designator associated with the partitioner and operable to designate only one thread from the master group for execution and all threads in the at least one worker group for execution.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: August 29, 2017
    Assignee: Nvidia Corporation
    Inventors: Yuan Lin, Gautam Chakrabarti, Jaydeep Marathe, Okwan Kwon, Amit Sabne
  • Patent number: 9727338
    Abstract: A system and method of translating functions of a program. In one embodiment, the system includes: (1) a local-scope variable identifier operable to identify local-scope variables employed in the at least some of the functions as being either thread-shared local-scope variables or thread-private local-scope variables and (2) a function translator associated with the local-scope variable identifier and operable to translate the at least some of the functions to cause thread-shared memory to be employed to store the thread-shared local-scope variables and thread-private memory to be employed to store the thread-private local-scope variables.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: August 8, 2017
    Assignee: Nvidia Corporation
    Inventors: Yuan Lin, Gautam Chakrabarti, Jaydeep Marathe, Okwan Kwon, Amit Sabne
  • Patent number: 9710275
    Abstract: A system and method for allocating shared memory of differing properties to shared data objects and a hybrid stack data structure. In one embodiment, the system includes: (1) a hybrid stack creator configured to create, in the shared memory, a hybrid stack data structure having a lower portion having a more favorable property and a higher portion having a less favorable property and (2) a data object allocator associated with the hybrid stack creator and configured to allocate storage for shared data object in the lower portion if the lower portion has a sufficient remaining capacity to contain the shared data object and alternatively allocate storage for the shared data object in the higher portion if the lower portion has an insufficient remaining capacity to contain the shared data object.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: July 18, 2017
    Assignee: Nvidia Corporation
    Inventors: Jaydeep Marathe, Yuan Lin, Gautam Chakrabarti, Okwan Kwon, Amit Sabne
  • Patent number: 9436475
    Abstract: A system and method for executing sequential code in the context of a single-instruction, multiple-thread (SIMT) processor. In one embodiment, the system includes: (1) a pipeline control unit operable to create a group of counterpart threads of the sequential code, one of the counterpart threads being a master thread, remaining ones of the counterpart threads being slave threads and (2) lanes operable to: (2a) execute certain instructions of the sequential code only in the master thread, corresponding instructions in the slave threads being predicated upon the certain instructions and (2b) broadcast branch conditions in the master thread to the slave threads.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: September 6, 2016
    Assignee: NVIDIA CORPORATION
    Inventors: Gautam Chakrabarti, Yuan Lin, Jaydeep Marathe, Okwan Kwon, Amit Sabne
  • Publication number: 20140129812
    Abstract: A system and method for executing sequential code in the context of a single-instruction, multiple-thread (SIMT) processor. In one embodiment, the system includes: (1) a pipeline control unit operable to create a group of counterpart threads of the sequential code, one of the counterpart threads being a master thread, remaining ones of the counterpart threads being slave threads and (2) lanes operable to: (2a) execute certain instructions of the sequential code only in the master thread, corresponding instructions in the slave threads being predicated upon the certain instructions and (2b) broadcast branch conditions in the master thread to the slave threads.
    Type: Application
    Filed: December 21, 2012
    Publication date: May 8, 2014
    Applicant: Nvidia Corporation
    Inventors: Gautam Chakrabarti, Yuan Lin, Jaydeep Marathe, Okwan Kwon, Amit Sabne
  • Publication number: 20140130052
    Abstract: A system and method for compiling or runtime executing a fork-join data parallel program with function calls. In one embodiment, the system includes: (1) a partitioner operable to partition groups into a master group and at least one worker group and (2) a thread designator associated with the partitioner and operable to designate only one thread from the master group for execution and all threads in the at least one worker group for execution.
    Type: Application
    Filed: December 21, 2012
    Publication date: May 8, 2014
    Applicant: Nvidia Corporation
    Inventors: Yuan Lin, Gautam Chakrabarti, Jaydeep Marathe, Okwan Kwon, Amit Sabne
  • Publication number: 20140129783
    Abstract: A system and method for allocating shared memory of differing properties to shared data objects and a hybrid stack data structure. In one embodiment, the system includes: (1) a hybrid stack creator configured to create, in the shared memory, a hybrid stack data structure having a lower portion having a more favorable property and a higher portion having a less favorable property and (2) a data object allocator associated with the hybrid stack creator and configured to allocate storage for shared data object in the lower portion if the lower portion has a sufficient remaining capacity to contain the shared data object and alternatively allocate storage for the shared data object in the higher portion if the lower portion has an insufficient remaining capacity to contain the shared data object.
    Type: Application
    Filed: December 21, 2012
    Publication date: May 8, 2014
    Applicant: NVIDIA
    Inventors: Jaydeep Marathe, Gautam Chakrabarti, Yuan Lin, Okwan Kwon, Amit Sabne
  • Publication number: 20140130021
    Abstract: A system and method of translating functions of a program. In one embodiment, the system includes: (1) a local-scope variable identifier operable to identify local-scope variables employed in the at least some of the functions as being either thread-shared local-scope variables or thread-private local-scope variables and (2) a function translator associated with the local-scope variable identifier and operable to translate the at least some of the functions to cause thread-shared memory to be employed to store the thread-shared local-scope variables and thread-private memory to be employed to store the thread-private local-scope variables.
    Type: Application
    Filed: December 21, 2012
    Publication date: May 8, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Yuan Lin, Gautam Chakrabarti, Jaydeep Marathe, Okwan Kwon, Amit Sabne