Patents by Inventor Lacky V. Shah

Lacky V. Shah has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20130120413
    Abstract: One embodiment of the present invention sets forth a technique for receiving versions of state objects at one or more stages in a processing pipeline. The method includes receiving a first version of a state object at a first stage in the processing pipeline, determining that the first version of the state object is relevant to the first stage, incrementing a first reference counter associated with the first version of the state object, assigning the first version of the state object to work requests that arrive at the first stage subsequent to the receipt of the first version of the state object, and transmitting the first version of the state object to a second stage in the processing pipeline.
    Type: Application
    Filed: November 11, 2011
    Publication date: May 16, 2013
    Inventors: Sean J. TREICHLER, Lacky V. Shah, Daniel Elliot Wexler
  • Publication number: 20130124838
    Abstract: One embodiment of the present invention sets forth a technique instruction level and compute thread array granularity execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. When preemption is performed at a compute thread array boundary, the amount of context state to be stored is reduced because execution units within the processing pipeline complete execution of in-flight instructions and become idle. If, the amount of time needed to complete execution of the in-flight instructions exceeds a threshold, then the preemption may dynamically change to be performed at the instruction level instead of at compute thread array granularity.
    Type: Application
    Filed: November 10, 2011
    Publication date: May 16, 2013
    Inventors: Lacky V. SHAH, Gregory Scott Palmer, Gernot Schaufler, Samuel H. Duncan, Philip Browning Johnson, Shirish Gadre, Robert Ohannessian, Nicholas Wang, Christopher Lamb, Philip Alexander Cuadra, Timothy John Purcell
  • Publication number: 20130120412
    Abstract: One embodiment of the present invention sets forth a technique for executing an operation once work associated with a version of a state object has been completed. The method includes receiving the version of the state object at a first stage in a processing pipeline, where the version of the state object is associated with a reference count object, determining that the version of the state object is relevant to the first stage, incrementing a counter included in the reference count object, transmitting the version of the state object to a second stage in the processing pipeline, processing work associated with the version of the state object, decrementing the counter, determining that the counter is equal to zero, and in response, executing an operation specified by the reference count object.
    Type: Application
    Filed: November 11, 2011
    Publication date: May 16, 2013
    Inventors: Sean J. TREICHLER, Lacky V. Shah, Daniel Elliot Wexler
  • Publication number: 20130117760
    Abstract: One embodiment of the present invention sets forth a technique for instruction level execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. Any in-flight instructions that follow the preemption command in the processing pipeline are captured and stored in a processing task buffer to be reissued when the preempted program is resumed. The processing task buffer is designated as a high priority task to ensure the preempted instructions are reissued before any new instructions for the preempted context when execution of the preempted context is restored.
    Type: Application
    Filed: November 8, 2011
    Publication date: May 9, 2013
    Inventors: Philip Alexander Cuadra, Christopher Lamb, Lacky V. Shah
  • Publication number: 20130117758
    Abstract: One embodiment of the present invention sets forth a technique for managing the allocation and release of resources during multi-threaded program execution. Programmable reference counters are initialized to values that limit the amount of resources for allocation to tasks that share the same reference counter. Resource parameters are specified for each task to define the amount of resources allocated for consumption by each array of execution threads that is launched to execute the task. The resource parameters also specify the behavior of the array for acquiring and releasing resources. Finally, during execution of each thread in the array, an exit instruction may be configured to override the release of the resources that were allocated to the array. The resources may then be retained for use by a child task that is generated during execution of a thread.
    Type: Application
    Filed: November 8, 2011
    Publication date: May 9, 2013
    Inventors: Philip Alexander Cuadra, Karim M. Abdalla, Jerome F. Duluk, JR., Luke Durant, Gerald F. Luiz, Timothy John Purcell, Lacky V. Shah
  • Publication number: 20130117751
    Abstract: One embodiment of the present invention sets forth a technique for encapsulating compute task state that enables out-of-order scheduling and execution of the compute tasks. The scheduling circuitry organizes the compute tasks into groups based on priority levels. The compute tasks may then be selected for execution using different scheduling schemes. Each group is maintained as a linked list of pointers to compute tasks that are encoded as task metadata (TMD) stored in memory. A TMD encapsulates the state and parameters needed to initialize, schedule, and execute a compute task.
    Type: Application
    Filed: November 9, 2011
    Publication date: May 9, 2013
    Inventors: Jerome F. DULUK, JR., Lacky V. SHAH, Sean J. TREICHLER
  • Publication number: 20130074088
    Abstract: One embodiment of the present invention sets forth a technique for dynamically scheduling and managing compute tasks with different execution priority levels. The scheduling circuitry organizes the compute tasks into groups based on priority levels. The compute tasks may then be selected for execution using different scheduling schemes, such as round-robin, priority, and partitioned priority. Each group is maintained as a linked list of pointers to compute tasks that are encoded as queue metadata (QMD) stored in memory. A QMD encapsulates the state needed to execute a compute task. When a task is selected for execution by the scheduling circuitry, the QMD is removed for a group and transferred to a table of active compute tasks. Compute tasks are then selected from the active task table for execution by a streaming multiprocessor.
    Type: Application
    Filed: September 19, 2011
    Publication date: March 21, 2013
    Inventors: Timothy John PURCELL, Lacky V. Shah, Jerome F. Duluk, JR.
  • Publication number: 20130070760
    Abstract: One embodiment of the present invention is a control unit for distributing packets of work to one or more consumer of works. The control unit is configured to assign at least one processing domain from a set of processing domains to each consumer included in the one or more consumers, receive a plurality of packets of work from at least one producer of work, wherein each packet of work is associated with a processing domain from the set of processing domains, and a first packet of work associated with a first processing domain can be processed by the one or more consumers independently of a second packet of work associated with a second processing domain, identify a first consumer that has been assigned the first processing domain, and transmit the first packet of work to the first consumer for processing.
    Type: Application
    Filed: September 15, 2011
    Publication date: March 21, 2013
    Inventors: Lacky V. SHAH, Sean J. Treichler, Abraham B. de Waal
  • Patent number: 8095782
    Abstract: Graphics processing elements are capable of processing multiple contexts simultaneously, reducing the need to perform time consuming context switches compared with processing a single context at a time. Processing elements of a graphics processing pipeline may be configured to support all of the multiple contexts or only a portion of the multiple contexts. Each processing element may be allocated to process a particular context or a portion of the multiple contexts in order to simultaneously process more than one context. The allocation of processing elements to the multiple contexts may be determined dynamically in order to improve graphics processing throughput.
    Type: Grant
    Filed: June 14, 2007
    Date of Patent: January 10, 2012
    Assignee: NVIDIA Corporation
    Inventors: John M. Danskin, Lacky V. Shah
  • Patent number: 7386861
    Abstract: A blocking system intercepts communications between a software program and an operating system in order to handle blocking and unblocking of event signals. The blocking system intercepts system calls to the operating system requesting the blocking and unblocking of event signals and keeps track of which event signals are blocked and unblocked without delivering the system calls to the operating system. The blocking system also intercepts event signals from the operating system and only allows unblocked event signals to pass to the software program. Blocked event signals received by the blocking system are discarded until the program unblocks the blocked event signals. After unblocking an event signal, the blocking system determines whether a corresponding event signal was previously received and blocked. If so, the blocking system transmits a signal indicating that the event corresponding to the event signal occurred.
    Type: Grant
    Filed: July 23, 2003
    Date of Patent: June 10, 2008
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: William B. Buzbee, James S. Mattson, Lacky V. Shah
  • Patent number: 6658486
    Abstract: A blocking system intercepts communications between a software program and an operating system in order to handle blocking and unblocking of event signals. The blocking system intercepts system calls to the operating system requesting the blocking and unblocking of event signals and keeps track of which event signals are blocked and unblocked without delivering the system calls to the operating system. The blocking system also intercepts event signals from the operating system and only allows unblocked event signals to pass to the software program. Blocked event signals received by the blocking system are discarded until the program unblocks the blocked event signals. After unblocking an event signal, the blocking system determines whether a corresponding event signal was previously received and blocked. If so, the blocking system transmits a signal indicating that the event corresponding to the event signal occurred.
    Type: Grant
    Filed: February 25, 1998
    Date of Patent: December 2, 2003
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: William B. Buzbee, James S. Mattson, Lacky V. Shah
  • Patent number: 6622300
    Abstract: The present invention is a system and method of using a kernel module to perform dynamic optimizations both of user programs and of the computer operating system kernel, itself. The kernel module permits optimized translations to be shared across a computer system without emulation because the kernel module has the privileges necessary to write into the computer program text in shared user memory space. In addition, the kernel module can be used to optimize the kernel itself because it, too, is located in the kernel memory space.
    Type: Grant
    Filed: April 21, 1999
    Date of Patent: September 16, 2003
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Umesh Krishnaswamy, Lacky V. Shah
  • Publication number: 20020026534
    Abstract: A blocking system intercepts communications between a software program and an operating system in order to handle blocking and unblocking of event signals. The blocking system intercepts system calls to the operating system requesting the blocking and unblocking of event signals and keeps track of which event signals are blocked and unblocked without delivering the system calls to the operating system. The blocking system also intercepts event signals from the operating system and only allows unblocked event signals to pass to the software program. Blocked event signals received by the blocking system are discarded until the program unblocks the blocked event signals. After unblocking an event signal, the blocking system determines whether a corresponding event signal was previously received and blocked. If so, the blocking system transmits a signal indicating that the event corresponding to the event signal occurred.
    Type: Application
    Filed: February 25, 1998
    Publication date: February 28, 2002
    Inventors: WILLIAM B. BUZBEE, JAMES S. MATTSON, LACKY V. SHAH
  • Patent number: 6327704
    Abstract: A computer-implemented system, method, and product are provided for multi-branch backpatching in a dynamic translator. Such backpatching typically increases the speed of execution of translated instructions by providing a direct control path from translated multi-branch-jump instructions to their translated target instructions. In one embodiment, the multi-branch backpatching dynamic translator undertakes backpatching on an “as-needed” basis at run time. That is, backpatching is done for those branch targets that are executed rather than for all branch targets, or rather than for those branch targets that are estimated or assumed will be executed. Such backpatching is accomplished in one embodiment by generating dynamic backpatching code specific to each translated multi-branch-jump instruction. A multi-branch jump, or switch, table of each multi-branch-jump instruction is initialized so that all entries direct control to the dynamic backpatching code for that instruction.
    Type: Grant
    Filed: August 6, 1998
    Date of Patent: December 4, 2001
    Assignee: Hewlett-Packard Company
    Inventors: James S. Mattson, Jr., Lacky V. Shah, William B. Buzbee, Manuel E. Benitez
  • Patent number: 6295644
    Abstract: The present invention relates to a method and an apparatus for patching program text to improve performance of applications running on a computer through the elimination of table lookup and emulation.
    Type: Grant
    Filed: August 17, 1999
    Date of Patent: September 25, 2001
    Assignee: Hewlett-Packard Company
    Inventors: Wei Hsu, Lacky V. Shah
  • Patent number: 6223339
    Abstract: The present invention is a system, method, and product for improving the speed of dynamic translation systems by efficiently positioning translated instructions in a computer memory unit. More specifically, the speed of execution of translated instructions, which is a factor of particular relevance to dynamic optimization systems, may be adversely affected by inefficient jumping between traces of translated instructions. The present invention efficiently positions the traces with respect to each other and with respect to “trampoline” instructions that redirect control flow from the traces. For example, trampoline instructions may redirect control flow to an instruction emulator if the target instruction has not been translated, or to the translation of a target instruction that has been translated. When a target instruction has been translated, a backpatcher of the invention may directly backpatch the jump to the target so that the trampoline instructions are no longer needed.
    Type: Grant
    Filed: September 8, 1998
    Date of Patent: April 24, 2001
    Assignee: Hewlett-Packard Company
    Inventors: Lacky V. Shah, James S. Mattson, Jr., William B. Buzbee
  • Patent number: 6205545
    Abstract: A run-time optimization strategy uses a trace picker to identify traces of program code in a native code pool, and a translator to translate the traces into a code cache where the traces are executed natively. Static branch prediction hints are encoded in branch instruction in the translated traces. A program module implementing the present invention is initialized with an empty code cache and a pool of instruction in a native code pool. The trace picker analyzes the instructions in the native code pool and identifies traces of instructions that tend to be executed as a group. When a trace is identified, basic blocks lying along the trace path are translated into a code cache, with static branch predictions encoded into the branch instructions of the basic blocks based on branching behavior observed when the trace is identified.
    Type: Grant
    Filed: April 30, 1998
    Date of Patent: March 20, 2001
    Assignee: Hewlett-Packard Company
    Inventors: Lacky V. Shah, James S. Mattson, Jr., William B. Buzbee
  • Patent number: 6189141
    Abstract: A computer-implemented system, method, and product are provided to designate and translate traces of original instructions of an executable file at run time based on dynamic evaluation of control flow through frequently executed traces of instructions. Such designation typically reduces unnecessary translations and optimizations, and thereby increases execution speed and reduces the usage of memory and other resources. The invention includes a hot trace identifier to identify frequently executed traces of instructions and a hot trace instrumenter to instrument such frequently executed traces so that control flow through them may be recorded. If the amount or rate of control flow through a frequently executed trace exceeds a threshold value, a hot trace selector is invoked to select a hot trace of original instructions including those of the frequently executed trace. The hot trace may be dynamically optimized.
    Type: Grant
    Filed: May 4, 1998
    Date of Patent: February 13, 2001
    Assignee: Hewlett-Packard Company
    Inventors: Manuel E. Benitez, James S. Mattson, Jr., William B. Buzbee, Lacky V. Shah
  • Patent number: 6164841
    Abstract: A method and apparatus for improving the process of software development by a dynamic software development tool. The present invention efficiently executes in a user process and provides software developers with a high performance tool for software optimization. The present invention may augment the user process code instructions at runtime and, for every series of machine instructions that the original user source code would have executed, a series of instructions may be executed that are semantically equivalent to the user process code instructions and are altered to optimize the user process code instructions. The present invention may use emulation or translation to alter the user process code instructions. The resulting process is executed in the user process space and advantageously maintains the original flow of instruction execution. The present invention employs a technique of dynamically translating code at runtime and may operate on a virtual machine or a hardware machine.
    Type: Grant
    Filed: May 4, 1998
    Date of Patent: December 26, 2000
    Assignee: Hewlett-Packard Company
    Inventors: James S. Mattson, Jr., Lacky V. Shah, William B. Buzbee, Manuel E. Benitez
  • Patent number: 6148437
    Abstract: A computer-implemented system and method are provided to designate traces of original instructions of an executable file at run time based on evaluations of control flow through jump instructions. Such designation typically increases the opportunities for dynamic optimization based on loop unrolling and other modifications of the control-flow structure of the executable file. The target of a jump instruction is designated as the start of a trace if the number of times that control has passed to it through any one or more jump instructions of a predetermined type of jump instruction reaches a predetermined start-trace threshold. The trace is ended if the number of times that control has passed through jump instructions of one of a variety of particular types of jump instructions reaches an end-trace threshold that is predetermined for each such type of jump instruction. The invention includes an instruction emulator, a start-end designator, a trace translator and optimizer, and a backpatch manager.
    Type: Grant
    Filed: May 4, 1998
    Date of Patent: November 14, 2000
    Assignee: Hewlett-Packard Company
    Inventors: Lacky V. Shah, James S. Mattson, Jr., William B. Buzbee