Patents by Inventor Aravind Natarajan

Aravind Natarajan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10261831
    Abstract: Embodiments include computing devices, apparatus, and methods implemented by the apparatus for implementing speculative loop iteration partitioning (SLIP) for heterogeneous processing devices. A computing device may receive iteration information for a first partition of iterations of a repetitive process and select a SLIP heuristic based on available SLIP information and iteration information for the first partition. The computing device may determine a split value for the first partition using the SLIP heuristic, and partition the first partition using the split value to produce a plurality of next partitions.
    Type: Grant
    Filed: August 24, 2016
    Date of Patent: April 16, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Arun Raman, Han Zhao, Aravind Natarajan
  • Patent number: 10152243
    Abstract: Embodiments include computing devices, apparatus, and methods implemented by the apparatus for implementing data flow management on a computing device. Embodiment methods may include initializing a buffer partition of a first memory of a first heterogeneous processing device for an output of execution of a first iteration of a first operation by the first heterogeneous processing device on which a first iteration of a second operation assigned for execution by a second heterogeneous processing device depends. Embodiment methods may include identifying a memory management operation for transmitting the output by the first heterogeneous processing device from the buffer partition as an input to the second heterogeneous processing device. Embodiment methods may include allocating a second memory for storing data for an iteration executed by a third heterogeneous processing device to minimize a number of memory management operations for the second allocated memory.
    Type: Grant
    Filed: September 15, 2016
    Date of Patent: December 11, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Han Zhao, Arun Raman, Aravind Natarajan
  • Patent number: 10114681
    Abstract: Embodiments include computing devices, systems, and methods identifying enhanced synchronization operation outcomes. A computing device may receive a first resource access request for a first resource of a computing device including a first requester identifier from a first computing element of the computing device. The computing device may also receive a second resource access request for the first resource including a second requester identifier from a second computing element of the computing device. The computing device may grant the first computing element access to the first resource based on the first resource access request, and return a response to the second computing element including the first requester identifier as a winner computing element identifier.
    Type: Grant
    Filed: March 30, 2016
    Date of Patent: October 30, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Dario Suarez Gracia, Gheorghe Cascaval, Han Zhao, Tushar Kumar, Aravind Natarajan, Arun Raman
  • Patent number: 10031697
    Abstract: Methods, devices, and non-transitory processor-readable storage media for a computing device to merge concurrent writes from a plurality of processing units to a buffer associated with an application. An embodiment method executed by a processor may include identifying a plurality of concurrent requests to access the buffer that are sparse, disjoint, and write-only, configuring a write-set for each of the plurality of processing units, executing the plurality of concurrent requests to access the buffer using the write-sets, determining whether each of the plurality of concurrent requests to access the buffer is complete, obtaining a buffer index and data via the write-set of each of the plurality of processing units, and writing to the buffer using the received buffer index and data via the write-set of each of the plurality of processing units in response to determining that each of the plurality of concurrent requests to access the buffer is complete.
    Type: Grant
    Filed: January 19, 2016
    Date of Patent: July 24, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Tushar Kumar, Aravind Natarajan, Dario Suarez Gracia
  • Publication number: 20180074727
    Abstract: Embodiments include computing devices, apparatus, and methods implemented by the apparatus for implementing data flow management on a computing device. Embodiment methods may include initializing a buffer partition of a first memory of a first heterogeneous processing device for an output of execution of a first iteration of a first operation by the first heterogeneous processing device on which a first iteration of a second operation assigned for execution by a second heterogeneous processing device depends. Embodiment methods may include identifying a memory management operation for transmitting the output by the first heterogeneous processing device from the buffer partition as an input to the second heterogeneous processing device. Embodiment methods may include allocating a second memory for storing data for an iteration executed by a third heterogeneous processing device to minimize a number of memory management operations for the second allocated memory.
    Type: Application
    Filed: September 15, 2016
    Publication date: March 15, 2018
    Inventors: Han Zhao, Arun Raman, Aravind Natarajan
  • Publication number: 20180060130
    Abstract: Embodiments include computing devices, apparatus, and methods implemented by the apparatus for implementing speculative loop iteration partitioning (SLIP) for heterogeneous processing devices. A computing device may receive iteration information for a first partition of iterations of a repetitive process and select a SLIP heuristic based on available SLIP information and iteration information for the first partition. The computing device may determine a split value for the first partition using the SLIP heuristic, and partition the first partition using the split value to produce a plurality of next partitions.
    Type: Application
    Filed: August 24, 2016
    Publication date: March 1, 2018
    Inventors: Arun Raman, Han Zhao, Aravind Natarajan
  • Publication number: 20180052776
    Abstract: Embodiments include computing devices, apparatus, and methods implemented by the apparatus for implementing shared virtual index translation on a computing device. The computing device may receive a base virtual address for storing an output of a kernel function execution to a dedicated memory and determine whether the virtual address is in a range of virtual addresses for a privatized output buffer within the dedicated memory, which may be smaller than the dedicated memory. The computing device may calculate a first modified physical address using a physical address mapped to the base virtual address and an offset of a first processing device associated with the dedicated memory in response to determining that the base virtual address is in the range of virtual addresses. The computing device may store the output of the kernel function execution to the privatized output buffer at the first modified physical address.
    Type: Application
    Filed: August 18, 2016
    Publication date: February 22, 2018
    Inventors: Han Zhao, Arun Raman, Aravind Natarajan
  • Publication number: 20170286182
    Abstract: Embodiments include computing devices, systems, and methods identifying enhanced synchronization operation outcomes. A computing device may receive a first resource access request for a first resource of a computing device including a first requester identifier from a first computing element of the computing device. The computing device may also receive a second resource access request for the first resource including a second requester identifier from a second computing element of the computing device. The computing device may grant the first computing element access to the first resource based on the first resource access request, and return a response to the second computing element including the first requester identifier as a winner computing element identifier.
    Type: Application
    Filed: March 30, 2016
    Publication date: October 5, 2017
    Inventors: Dario Suarez Gracia, Gheorghe Cascaval, Han Zhao, Tushar Kumar, Aravind Natarajan, Arun Raman
  • Patent number: 9733978
    Abstract: Various embodiments include methods for data management in a computing device utilizing a plurality of processing units. Embodiment methods may include generating a data transfer heuristic model based on measurements from a plurality of sample data transfers between a plurality of data storage units. The generated data transfer heuristic model may be used to calculate data transfer costs for each of a plurality of tasks. The calculated data transfer costs may be used to schedule execution of the plurality of tasks in an execution order on selected ones of the plurality of processing units. The data transfer heuristic model may be updated based on measurements of data transfers occurring during the executions of the plurality of tasks (e.g., time, power consumption, etc.). Code executing on the processing units may indicate to a runtime when certain data blocks are no longer needed and thus may be evicted and/or pre-fetched for others.
    Type: Grant
    Filed: August 27, 2015
    Date of Patent: August 15, 2017
    Assignee: QUALCOMM Incorporated
    Inventors: Dario Suarez Gracia, Tushar Kumar, Aravind Natarajan, Ravish Hastantram, Gheorghe Calin Cascaval, Han Zhao
  • Publication number: 20170206035
    Abstract: Methods, devices, and non-transitory processor-readable storage media for a computing device to merge concurrent writes from a plurality of processing units to a buffer associated with an application. An embodiment method executed by a processor may include identifying a plurality of concurrent requests to access the buffer that are sparse, disjoint, and write-only, configuring a write-set for each of the plurality of processing units, executing the plurality of concurrent requests to access the buffer using the write-sets, determining whether each of the plurality of concurrent requests to access the buffer is complete, obtaining a buffer index and data via the write-set of each of the plurality of processing units, and writing to the buffer using the received buffer index and data via the write-set of each of the plurality of processing units in response to determining that each of the plurality of concurrent requests to access the buffer is complete.
    Type: Application
    Filed: January 19, 2016
    Publication date: July 20, 2017
    Inventors: Tushar Kumar, Aravind Natarajan, Dario Suarez Gracia
  • Publication number: 20170060633
    Abstract: Various embodiments include methods for data management in a computing device utilizing a plurality of processing units. Embodiment methods may include generating a data transfer heuristic model based on measurements from a plurality of sample data transfers between a plurality of data storage units. The generated data transfer heuristic model may be used to calculate data transfer costs for each of a plurality of tasks. The calculated data transfer costs may be used to schedule execution of the plurality of tasks in an execution order on selected ones of the plurality of processing units. The data transfer heuristic model may be updated based on measurements of data transfers occurring during the executions of the plurality of tasks (e.g., time, power consumption, etc.). Code executing on the processing units may indicate to a runtime when certain data blocks are no longer needed and thus may be evicted and/or pre-fetched for others.
    Type: Application
    Filed: August 27, 2015
    Publication date: March 2, 2017
    Inventors: Dario Suarez Gracia, Tushar Kumar, Aravind Natarajan, Ravish Hastantram, Gheorghe Calin Cascaval, Han Zhao
  • Patent number: 9501328
    Abstract: Embodiments include computing devices, systems, and methods for task-based handling of repetitive processes in parallel. At least one processor of the computing device, or a specialized hardware controller, may be configured to partition iterations of a repetitive process and assign the partitions to initialized tasks to be executed in parallel by a plurality of processor cores. Upon completing a task, remaining divisible partitions of the repetitive process of ongoing tasks may be subpartitioned and assigned to the ongoing task, and the completed task or a newly initialized task. Information about the iteration space for a repetitive process may be stored in a descriptor table, and status information for all partitions of a repetitive process stored in a status table. Each processor core may have an associated local table that tracks iteration execution of each task, and is synchronized with the status table.
    Type: Grant
    Filed: March 30, 2015
    Date of Patent: November 22, 2016
    Assignee: QUALCOMM Incorporated
    Inventors: Behnam Robatmili, Shaizeen Dilawarhusen Aga, Dario Suarez Gracia, Arun Raman, Aravind Natarajan, Gheorghe Calin Cascaval, Pablo Montesinos Ortego, Han Zhao
  • Publication number: 20160292012
    Abstract: Embodiments include computing devices, systems, and methods for task-based handling of repetitive processes in parallel. At least one processor of the computing device, or a specialized hardware controller, may be configured to partition iterations of a repetitive process and assign the partitions to initialized tasks to be executed in parallel by a plurality of processor cores. Upon completing a task, remaining divisible partitions of the repetitive process of ongoing tasks may be subpartitioned and assigned to the ongoing task, and the completed task or a newly initialized task. Information about the iteration space for a repetitive process may be stored in a descriptor table, and status information for all partitions of a repetitive process stored in a status table. Each processor core may have an associated local table that tracks iteration execution of each task, and is synchronized with the status table.
    Type: Application
    Filed: March 30, 2015
    Publication date: October 6, 2016
    Inventors: Behnam ROBATMILI, Shaizeen Dilawarhusen Aga, Dario Suarez Gracia, Arun Raman, Aravind Natarajan, Gheorghe Calin Cascaval, Pablo Montesinos Ortego, Han Zhao
  • Publication number: 20160267005
    Abstract: Various embodiments include methods for reclaiming memory in a computing device that may include storing a first pointer pointing to a first memory location storing the beginning of a data structure in which a plurality of threads executing on the computing device may concurrently access the data structure and storing a second pointer pointing to the current beginning of the data structure. In response to performing an operation on the data structure that changes the location of the beginning of the data structure from the first memory location to a second memory location, the second pointer may be updated to point to the second memory location. In response to determining that memory allocated to the data structure may be reclaimed, memory allocated to the data structure, including memory located at the first memory location pointed to by the first pointer, may be reclaimed.
    Type: Application
    Filed: August 12, 2015
    Publication date: September 15, 2016
    Inventors: Aravind Natarajan, Gheorghe Calin Cascaval