Patents by Inventor Arun Raman

Arun Raman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9778961
    Abstract: Methods, devices, systems, and non-transitory process-readable storage media for a multi-processor computing device to schedule multi-versioned tasks on a plurality of processing units. An embodiment method may include processor-executable operations for enqueuing a specialized version of a multi-versioned task in a task queue for each of the plurality of processing units, wherein each specialized version is configured to be executed by a different processing unit of the plurality of processing units, providing ownership over the multi-versioned task to a first processing unit when the first processing unit is available to immediately execute a corresponding specialized version of the multi-versioned task, and discarding other specialized versions of the multi-versioned task in response to providing ownership over the multi-versioned task to the first processing unit. Various operations of the method may be performed via a runtime functionality.
    Type: Grant
    Filed: September 14, 2015
    Date of Patent: October 3, 2017
    Assignee: QUALCOMM Incorporated
    Inventor: Arun Raman
  • Patent number: 9710315
    Abstract: A computing device may be configured to generate and execute a task that includes one or more blocking constructs that each encapsulate a blocking activity and a notification handler corresponding to each blocking activity. The computing device may launch the task, execute one or more of the blocking constructs, register the corresponding notification handler for the blocking activity that will be executed next with the runtime system, perform the blocking activity encapsulated by the blocking construct to request information from an external resource, cause the task to enter a blocked state while it waits for a response from the external resource, receive an unblocking notification from an external entity, and invoke the registered notification handler to cause the task to exit the blocked state and/or perform clean up operations to exit/terminate the task gracefully.
    Type: Grant
    Filed: January 19, 2015
    Date of Patent: July 18, 2017
    Inventors: Tushar Kumar, Pablo Montesinos Ortego, Arun Raman
  • Patent number: 9678790
    Abstract: A method and computing device, for enabling selective enforcement of complex task dependencies. The method and allows a computing device to determine whether to enforce task-dependencies based on programmer or end-user goals concerning efficiency and quality of runtime experience. A computing device may be configured to schedule executing a first task, identify an operation (e.g., a “+>” operation) of the first task as being selectively dependent on a second task finishing execution, and determining whether to enforce the dependency of the first task on the second task based on an evaluation of one or more enforcement conditions. If the enforcement conditions are not met, enforcing the dependency, executing the second task, and withholding execution of the first task until execution of the second task has finished. If the enforcement conditions are met, commencing execution of the first task prior to, or parallel to the second task finishing execution.
    Type: Grant
    Filed: July 7, 2015
    Date of Patent: June 13, 2017
    Assignee: QUALCOMM Incorporated
    Inventors: Arun Raman, Pablo Montesinos Ortego
  • Publication number: 20170109217
    Abstract: Embodiments include computing devices, apparatus, and methods implemented by a computing device for task scheduling in the presence of task conflict edges on a computing device. The computing device may determine whether a first task and a second task are related by a task conflict edge. In response to determining that the first task and the second task are related by the task conflict edge, the computing device may determine whether the second task acquires a resource required for execution of the first task and the second task. In response to determining that the second task fails to acquire the resource, the computing device may assign a dynamic task dependency edge from the first task to the second task.
    Type: Application
    Filed: October 16, 2015
    Publication date: April 20, 2017
    Inventors: Arun Raman, Tushar Kumar
  • Publication number: 20170109214
    Abstract: Embodiments include computing devices, apparatus, and methods implemented by a computing device for accelerating execution of a plurality of tasks belonging to a common property task graph. The computing device may identify a first successor task dependent upon a bundled task such that an available synchronization mechanism is a common property for the bundled task and the first successor task, and such that the first successor task only depends upon predecessor tasks for which the available synchronization mechanism is a common property. The computing device may add the first successor task to a common property task graph and add the plurality of tasks belonging to the common property task graph to a ready queue. The computing device may recursively identify successor tasks. The synchronization mechanism may include a synchronization mechanism for control logic flow or a synchronization mechanism for data access.
    Type: Application
    Filed: October 16, 2015
    Publication date: April 20, 2017
    Inventors: Arun Raman, Tushar Kumar
  • Publication number: 20170109195
    Abstract: Embodiments include computing devices, systems, and methods for task signaling on a computing device. Execution of a task by an initial thread on a critical path of execution may be interrupted to create at least one parallel task by the initial thread that can be executed in parallel with the task executed by the initial thread. An initial signal indicating the creation of the at least one parallel task to a relay thread may be sent by the initial thread. Execution of the task by the initial thread may resume before an acquisition of the at least one parallel task.
    Type: Application
    Filed: October 16, 2015
    Publication date: April 20, 2017
    Inventors: Arun Raman, Pablo Montesinos Ortego
  • Publication number: 20170083365
    Abstract: Methods, devices, and non-transitory process-readable storage media for dynamically adapting a frequency for detecting work-stealing operations in a multi-processor computing device. A method according to various embodiments and performed by a processor includes determining whether any work items of a cooperative task have been reassigned from a first processing unit to a second processing unit, calculating a chunk size using a default equation in response to determining that no work items of the cooperative task have been reassigned from the first processing unit, calculating the chunk size using a victim equation in response to determining that one or more work items of the cooperative task have been reassigned from the first processing unit, and executing a set of work items of the cooperative task that correspond to the calculated chunk size.
    Type: Application
    Filed: September 23, 2015
    Publication date: March 23, 2017
    Inventors: Han Zhao, Arun Raman, Pablo Montesinos Ortego
  • Publication number: 20170075734
    Abstract: Methods, devices, systems, and non-transitory process-readable storage media for a multi-processor computing device to schedule multi-versioned tasks on a plurality of processing units. An embodiment method may include processor-executable operations for enqueuing a specialized version of a multi-versioned task in a task queue for each of the plurality of processing units, wherein each specialized version is configured to be executed by a different processing unit of the plurality of processing units, providing ownership over the multi-versioned task to a first processing unit when the first processing unit is available to immediately execute a corresponding specialized version of the multi-versioned task, and discarding other specialized versions of the multi-versioned task in response to providing ownership over the multi-versioned task to the first processing unit. Various operations of the method may be performed via a runtime functionality.
    Type: Application
    Filed: September 14, 2015
    Publication date: March 16, 2017
    Inventor: Arun Raman
  • Publication number: 20170031728
    Abstract: Aspects include computing devices, systems, and methods for implementing scheduling and execution of lightweight kernels as simple tasks directly by a thread without setting up a task structure. A computing device may determine whether a task pointer in a task queue is a simple task pointer for the lightweight kernel. The computing device may schedule a first simple task for the lightweight kernel for execution by the thread. The computing device may retrieve, from an entry of a simple task table, a kernel pointer for the lightweight kernel. The entry in the simple task table may be associated with the simple task pointer. The computing device may directly execute the lightweight kernel as the simple task.
    Type: Application
    Filed: January 11, 2016
    Publication date: February 2, 2017
    Inventors: Han Zhao, Pablo Montesinos Ortego, Arun Raman, Behnam Robatmili, Gheorghe Calin Cascaval
  • Patent number: 9529643
    Abstract: A computing device (e.g., a mobile computing device, etc.) may be configured to may be configured to better exploit the concurrency and parallelism enabled by modern multiprocessor architectures by identifying a sequence of tasks via a task dependency controller, commencing execution of a first task in the sequence of tasks, and setting a value of a register so that each remaining task in the sequence of tasks executes after its predecessor task finishes execution without transferring control to a runtime system of the computing device. The task dependency controller may be a hardware component that is shared by the processor cores and/or otherwise configured to transfer control between tasks executing on different processor cores independent of the runtime system and/or without performing the relatively slow and memory-based inter-task, inter-thread or inter-process communications required by conventional solutions.
    Type: Grant
    Filed: January 26, 2015
    Date of Patent: December 27, 2016
    Assignee: QUALCOMM Incorporated
    Inventors: Arun Raman, Behnam Robatmili
  • Patent number: 9524170
    Abstract: A system includes a processor with a front end to receive an instruction stream reordered by a software scheduler and including a plurality of memory operations and alias information indicating how a given memory operation may be evaluated. Furthermore, the processor includes a hardware scheduler to reorder, in hardware, the instruction stream for out-of-order execution. In addition, the processor includes a calculation module to determine, for a given memory operation and based upon the alias information, a checking range of memory atoms subsequent to the given memory operation and a virtual order of the memory operation. The virtual order indicates an original ordering of the instructions. The processor also includes an alias unit to reorder the instruction stream, determine whether the hardware reordering caused an error, and determine whether the software reordering caused an error based upon the checking range and the virtual order.
    Type: Grant
    Filed: December 23, 2013
    Date of Patent: December 20, 2016
    Assignee: Intel Corporation
    Inventors: Rainer Theur, Arun Raman, Jaroslaw Topp, Rakesh Ranjan, Sebastian Winkel, Gregor Stellpflug, Ulrich Bretthauer
  • Patent number: 9501328
    Abstract: Embodiments include computing devices, systems, and methods for task-based handling of repetitive processes in parallel. At least one processor of the computing device, or a specialized hardware controller, may be configured to partition iterations of a repetitive process and assign the partitions to initialized tasks to be executed in parallel by a plurality of processor cores. Upon completing a task, remaining divisible partitions of the repetitive process of ongoing tasks may be subpartitioned and assigned to the ongoing task, and the completed task or a newly initialized task. Information about the iteration space for a repetitive process may be stored in a descriptor table, and status information for all partitions of a repetitive process stored in a status table. Each processor core may have an associated local table that tracks iteration execution of each task, and is synchronized with the status table.
    Type: Grant
    Filed: March 30, 2015
    Date of Patent: November 22, 2016
    Assignee: QUALCOMM Incorporated
    Inventors: Behnam Robatmili, Shaizeen Dilawarhusen Aga, Dario Suarez Gracia, Arun Raman, Aravind Natarajan, Gheorghe Calin Cascaval, Pablo Montesinos Ortego, Han Zhao
  • Publication number: 20160292012
    Abstract: Embodiments include computing devices, systems, and methods for task-based handling of repetitive processes in parallel. At least one processor of the computing device, or a specialized hardware controller, may be configured to partition iterations of a repetitive process and assign the partitions to initialized tasks to be executed in parallel by a plurality of processor cores. Upon completing a task, remaining divisible partitions of the repetitive process of ongoing tasks may be subpartitioned and assigned to the ongoing task, and the completed task or a newly initialized task. Information about the iteration space for a repetitive process may be stored in a descriptor table, and status information for all partitions of a repetitive process stored in a status table. Each processor core may have an associated local table that tracks iteration execution of each task, and is synchronized with the status table.
    Type: Application
    Filed: March 30, 2015
    Publication date: October 6, 2016
    Inventors: Behnam ROBATMILI, Shaizeen Dilawarhusen Aga, Dario Suarez Gracia, Arun Raman, Aravind Natarajan, Gheorghe Calin Cascaval, Pablo Montesinos Ortego, Han Zhao
  • Publication number: 20160217016
    Abstract: A computing device (e.g., a mobile computing device, etc.) may be configured to may be configured to better exploit the concurrency and parallelism enabled by modern multiprocessor architectures by identifying a sequence of tasks via a task dependency controller, commencing execution of a first task in the sequence of tasks, and setting a value of a register so that each remaining task in the sequence of tasks executes after its predecessor task finishes execution without transferring control to a runtime system of the computing device. The task dependency controller may be a hardware component that is shared by the processor cores and/or otherwise configured to transfer control between tasks executing on different processor cores independent of the runtime system and/or without performing the relatively slow and memory-based inter-task, inter-thread or inter-process communications required by conventional solutions.
    Type: Application
    Filed: January 26, 2015
    Publication date: July 28, 2016
    Inventors: Arun Raman, Behnam Robatmili
  • Publication number: 20160196162
    Abstract: A method and computing device, for enabling selective enforcement of complex task dependencies. The method and allows a computing device to determine whether to enforce task-dependencies based on programmer or end-user goals concerning efficiency and quality of runtime experience. A computing device may be configured to schedule executing a first task, identify an operation (e.g., a “+>” operation) of the first task as being selectively dependent on a second task finishing execution, and determining whether to enforce the dependency of the first task on the second task based on an evaluation of one or more enforcement conditions. If the enforcement conditions are not met, enforcing the dependency, executing the second task, and withholding execution of the first task until execution of the second task has finished. If the enforcement conditions are met, commencing execution of the first task prior to, or parallel to the second task finishing execution.
    Type: Application
    Filed: July 7, 2015
    Publication date: July 7, 2016
    Inventors: Arun Raman, Pablo Montesinos Ortego
  • Patent number: 9292446
    Abstract: A profiler may identify potentially-independent remote data accesses in a program. A remote data access is independent if value returned from said remote data access is not computed from another value returned from another remote data access appearing logically earlier in the program. A program rewriter may generate a program-specific prefetcher that preserves the behavior of the program, based on profiling information including the potentially-independent remote data accesses identified by the profiler. An execution engine may execute the prefetcher and the program concurrently. The execution engine may automatically decide which of said potentially-independent remote data accesses should be executed in parallel speculatively. A shared memory shared by the program and the prefetcher stores returned data from a data source as a result of issuing the remote data accesses.
    Type: Grant
    Filed: October 4, 2012
    Date of Patent: March 22, 2016
    Assignee: International Business Machines Corporation
    Inventors: Arun Raman, Martin Vechev, Mark N. Wegman, Eran Yahav, Greta Yorsh
  • Publication number: 20160078246
    Abstract: A computing device may be configured to generate and execute a task that includes one or more blocking constructs that each encapsulate a blocking activity and a notification handler corresponding to each blocking activity. The computing device may launch the task, execute one or more of the blocking constructs, register the corresponding notification handler for the blocking activity that will be executed next with the runtime system, perform the blocking activity encapsulated by the blocking construct to request information from an external resource, cause the task to enter a blocked state while it waits for a response from the external resource, receive an unblocking notification from an external entity, and invoke the registered notification handler to cause the task to exit the blocked state and/or perform clean up operations to exit/terminate the task gracefully.
    Type: Application
    Filed: January 19, 2015
    Publication date: March 17, 2016
    Inventors: Tushar Kumar, Pablo Montesinos Ortego, Arun Raman
  • Publication number: 20160055029
    Abstract: A computing device may be configured to commence or begin executing a first task via a first thread (e.g., in a first processor or core), begin executing a second task via a second thread (e.g., in a second processor or core), identify an operation of the second task as being dependent on the first task finishing execution, and change an operating state of the second task to “executed” prior to the first task finishing execution so as to allow the computing device to enforce task-dependencies while the second thread continues to process additional tasks. The computing device may begin executing a third task via the second thread (e.g., in a second processing core) prior to the first task finishing execution, and change the operating state of the second task to “finished” after the first task finishes.
    Type: Application
    Filed: January 26, 2015
    Publication date: February 25, 2016
    Inventors: Arun Raman, Pablo Montesinos Ortego
  • Publication number: 20150178090
    Abstract: A system includes a processor with a front end to receive an instruction stream reordered by a software scheduler and including a plurality of memory operations and alias information indicating how a given memory operation may be evaluated. Furthermore, the processor includes a hardware scheduler to reorder, in hardware, the instruction stream for out-of-order execution. In addition, the processor includes a calculation module to determine, for a given memory operation and based upon the alias information, a checking range of memory atoms subsequent to the given memory operation and a virtual order of the memory operation. The virtual order indicates an original ordering of the instructions. The processor also includes an alias unit to reorder the instruction stream, determine whether the hardware reordering caused an error, and determine whether the software reordering caused an error based upon the checking range and the virtual order.
    Type: Application
    Filed: December 23, 2013
    Publication date: June 25, 2015
    Inventors: Rainer Theuer, Arun Raman, Jaroslaw Topp, Rakesh Ranjan, Sebastian Winkel, Gregor Stellpflug, Ulrich Bretthauer
  • Publication number: 20140101278
    Abstract: A profiler may identify potentially-independent remote data accesses in a program. A remote data access is independent if value returned from said remote data access is not computed from another value returned from another remote data access appearing logically earlier in the program. A program rewriter may generate a program-specific prefetcher that preserves the behavior of the program, based on profiling information including the potentially-independent remote data accesses identified by the profiler. An execution engine may execute the prefetcher and the program concurrently. The execution engine may automatically decide which of said potentially-independent remote data accesses should be executed in parallel speculatively. A shared memory shared by the program and the prefetcher stores returned data from a data source as a result of issuing the remote data accesses.
    Type: Application
    Filed: October 4, 2012
    Publication date: April 10, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Arun Raman, Martin Vechev, Mark N. Wegman, Eran Yahav, Greta Yorsh