Patents by Inventor Dario Suarez Gracia
Dario Suarez Gracia has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10360063Abstract: Various embodiments proactively balance workloads between a plurality of processing units of a multi-processor computing device by making work-stealing determinations based on operating state data. An embodiment method includes obtaining static characteristics data associated with each of a victim processor and one or more of a plurality of processing units that are ready to steal work items from the victim processor (work-ready processors), obtaining dynamic characteristics data for each of the processors, calculating priority values for each of the processors based on the obtained data, and transferring a number of work items assigned to the victim processor to a winning work-ready processor based on the calculated priority values. In some embodiments, the method may include acquiring control over a probabilistic lock for a shared data structure and updating the shared data structure to indicate the number of work items transferred to the winning work-ready processor.Type: GrantFiled: September 23, 2015Date of Patent: July 23, 2019Assignee: QUALCOMM IncorporatedInventors: Han Zhao, Dario Suárez Gracia, Tushar Kumar
-
Patent number: 10114681Abstract: Embodiments include computing devices, systems, and methods identifying enhanced synchronization operation outcomes. A computing device may receive a first resource access request for a first resource of a computing device including a first requester identifier from a first computing element of the computing device. The computing device may also receive a second resource access request for the first resource including a second requester identifier from a second computing element of the computing device. The computing device may grant the first computing element access to the first resource based on the first resource access request, and return a response to the second computing element including the first requester identifier as a winner computing element identifier.Type: GrantFiled: March 30, 2016Date of Patent: October 30, 2018Assignee: QUALCOMM IncorporatedInventors: Dario Suarez Gracia, Gheorghe Cascaval, Han Zhao, Tushar Kumar, Aravind Natarajan, Arun Raman
-
Patent number: 10031697Abstract: Methods, devices, and non-transitory processor-readable storage media for a computing device to merge concurrent writes from a plurality of processing units to a buffer associated with an application. An embodiment method executed by a processor may include identifying a plurality of concurrent requests to access the buffer that are sparse, disjoint, and write-only, configuring a write-set for each of the plurality of processing units, executing the plurality of concurrent requests to access the buffer using the write-sets, determining whether each of the plurality of concurrent requests to access the buffer is complete, obtaining a buffer index and data via the write-set of each of the plurality of processing units, and writing to the buffer using the received buffer index and data via the write-set of each of the plurality of processing units in response to determining that each of the plurality of concurrent requests to access the buffer is complete.Type: GrantFiled: January 19, 2016Date of Patent: July 24, 2018Assignee: QUALCOMM IncorporatedInventors: Tushar Kumar, Aravind Natarajan, Dario Suarez Gracia
-
Publication number: 20170286182Abstract: Embodiments include computing devices, systems, and methods identifying enhanced synchronization operation outcomes. A computing device may receive a first resource access request for a first resource of a computing device including a first requester identifier from a first computing element of the computing device. The computing device may also receive a second resource access request for the first resource including a second requester identifier from a second computing element of the computing device. The computing device may grant the first computing element access to the first resource based on the first resource access request, and return a response to the second computing element including the first requester identifier as a winner computing element identifier.Type: ApplicationFiled: March 30, 2016Publication date: October 5, 2017Inventors: Dario Suarez Gracia, Gheorghe Cascaval, Han Zhao, Tushar Kumar, Aravind Natarajan, Arun Raman
-
Patent number: 9740504Abstract: Aspects include apparatuses, systems, and methods for hardware acceleration for inline caches in dynamic languages. An inline cache may be initialized for an instance of a dynamic software operation. A call of an initialized instance of the dynamic software operation may be executed by an inline cache hardware accelerator. The inline cache may be checked to determine that its data is current. When the data is current, the initialized instance of the dynamic software operation may be executed using the related inline cache data. When the data is not current, a new inline cache may be initialized for the instance of the dynamic software operation, including the not current data of a previously initialized instance of the dynamic software operation. The inline cache hardware accelerator may include an inline cache memory, a coprocessor, and/or a functional until one an inline cache pipeline connected to a processor pipeline.Type: GrantFiled: April 28, 2014Date of Patent: August 22, 2017Assignee: QUALCOMM IncorporatedInventors: Behnam Robatmili, Gheorghe Calin Cascaval, Madhukar Nagaraja Kedlaya, Dario Suarez Gracia
-
Patent number: 9733978Abstract: Various embodiments include methods for data management in a computing device utilizing a plurality of processing units. Embodiment methods may include generating a data transfer heuristic model based on measurements from a plurality of sample data transfers between a plurality of data storage units. The generated data transfer heuristic model may be used to calculate data transfer costs for each of a plurality of tasks. The calculated data transfer costs may be used to schedule execution of the plurality of tasks in an execution order on selected ones of the plurality of processing units. The data transfer heuristic model may be updated based on measurements of data transfers occurring during the executions of the plurality of tasks (e.g., time, power consumption, etc.). Code executing on the processing units may indicate to a runtime when certain data blocks are no longer needed and thus may be evicted and/or pre-fetched for others.Type: GrantFiled: August 27, 2015Date of Patent: August 15, 2017Assignee: QUALCOMM IncorporatedInventors: Dario Suarez Gracia, Tushar Kumar, Aravind Natarajan, Ravish Hastantram, Gheorghe Calin Cascaval, Han Zhao
-
Publication number: 20170206035Abstract: Methods, devices, and non-transitory processor-readable storage media for a computing device to merge concurrent writes from a plurality of processing units to a buffer associated with an application. An embodiment method executed by a processor may include identifying a plurality of concurrent requests to access the buffer that are sparse, disjoint, and write-only, configuring a write-set for each of the plurality of processing units, executing the plurality of concurrent requests to access the buffer using the write-sets, determining whether each of the plurality of concurrent requests to access the buffer is complete, obtaining a buffer index and data via the write-set of each of the plurality of processing units, and writing to the buffer using the received buffer index and data via the write-set of each of the plurality of processing units in response to determining that each of the plurality of concurrent requests to access the buffer is complete.Type: ApplicationFiled: January 19, 2016Publication date: July 20, 2017Inventors: Tushar Kumar, Aravind Natarajan, Dario Suarez Gracia
-
Patent number: 9710388Abstract: Aspects include a computing devices, systems, and methods for hardware acceleration for inline caches in dynamic languages. An inline cache may be initialized for an instance of a dynamic software operation. A call of an initialized instance of the dynamic software operation may be executed by an inline cache hardware accelerator. The inline cache may be checked to determine that its data is current. When the data is current, the initialized instance of the dynamic software operation may be executed using the related inline cache data. When the data is not current, a new inline cache may be initialized for the instance of the dynamic software operation, including the not current data of a previously initialized instance of the dynamic software operation. The inline cache hardware accelerator may include an inline cache memory, a coprocessor, and/or a functional until one an inline cache pipeline connected to a processor pipeline.Type: GrantFiled: April 28, 2014Date of Patent: July 18, 2017Assignee: QUALCOMM IncorporatedInventors: Behnam Robatmili, Gheorghe Calin Cascaval, Madhukar Nagaraja Kedlaya, Dario Suarez Gracia
-
Patent number: 9632569Abstract: Multi-processor computing device methods manage resource accesses by a signaling event manager signaling processor elements requesting access to a resource to wake up to access the resource when the resource is available or wait for an event when the resource is busy. Processor elements may enter a sleep state while awaiting access to the requested resource. When multiple elements are waiting for the resource, the processor element with a highest assigned priority is signaled to wake up when the resource is available without waking other elements. Priorities may be assigned to processor elements waiting for the resource based on a heuristic or parameter that may depend on a state of the computing device or the processor elements. A sleep duration may be estimated for a processor element waiting for a resource and the processor element may be removed from a scheduling queue or assigned another thread during the sleep duration.Type: GrantFiled: August 5, 2014Date of Patent: April 25, 2017Assignee: QUALCOMM IncorporatedInventors: Dario Suarez Gracia, Han Zhao, Pablo Montesinos Ortego, Gheorghe Calin Cascaval, James Xenidis
-
Publication number: 20170083827Abstract: Embodiments include computing devices, apparatus, and methods implemented by the apparatus for accelerating machine learning on a computing device. Raw data may be received in the computing device from a raw data source device. The apparatus may identify key features as two dimensional matrices of the raw data such that the key features are mutually exclusive from each other. The key features may be translated into key feature vectors. The computing device may generate a feature vector from at least one of the key feature vectors. The computing device may receive a first partial output resulting from an execution of a basic linear algebra subprogram (BLAS) operation using the feature vector and a weight factor. The first partial output may be combined with a plurality of partial outputs to produce an output matrix. Receiving the raw data on the computing device may include receiving streaming raw data.Type: ApplicationFiled: September 23, 2015Publication date: March 23, 2017Inventors: Behnam Robatmili, Matthew Leslie Badin, Dario Suárez Gracia, Gheorghe Calin Cascaval, Nayeem Islam
-
Publication number: 20170083364Abstract: Various embodiments proactively balance workloads between a plurality of processing units of a multi-processor computing device by making work-stealing determinations based on operating state data. An embodiment method includes obtaining static characteristics data associated with each of a victim processor and one or more of a plurality of processing units that are ready to steal work items from the victim processor (work-ready processors), obtaining dynamic characteristics data for each of the processors, calculating priority values for each of the processors based on the obtained data, and transferring a number of work items assigned to the victim processor to a winning work-ready processor based on the calculated priority values. In some embodiments, the method may include acquiring control over a probabilistic lock for a shared data structure and updating the shared data structure to indicate the number of work items transferred to the winning work-ready processor.Type: ApplicationFiled: September 23, 2015Publication date: March 23, 2017Inventors: Han Zhao, Dario Suárez Gracia, Tushar Kumar
-
Publication number: 20170060633Abstract: Various embodiments include methods for data management in a computing device utilizing a plurality of processing units. Embodiment methods may include generating a data transfer heuristic model based on measurements from a plurality of sample data transfers between a plurality of data storage units. The generated data transfer heuristic model may be used to calculate data transfer costs for each of a plurality of tasks. The calculated data transfer costs may be used to schedule execution of the plurality of tasks in an execution order on selected ones of the plurality of processing units. The data transfer heuristic model may be updated based on measurements of data transfers occurring during the executions of the plurality of tasks (e.g., time, power consumption, etc.). Code executing on the processing units may indicate to a runtime when certain data blocks are no longer needed and thus may be evicted and/or pre-fetched for others.Type: ApplicationFiled: August 27, 2015Publication date: March 2, 2017Inventors: Dario Suarez Gracia, Tushar Kumar, Aravind Natarajan, Ravish Hastantram, Gheorghe Calin Cascaval, Han Zhao
-
Patent number: 9501328Abstract: Embodiments include computing devices, systems, and methods for task-based handling of repetitive processes in parallel. At least one processor of the computing device, or a specialized hardware controller, may be configured to partition iterations of a repetitive process and assign the partitions to initialized tasks to be executed in parallel by a plurality of processor cores. Upon completing a task, remaining divisible partitions of the repetitive process of ongoing tasks may be subpartitioned and assigned to the ongoing task, and the completed task or a newly initialized task. Information about the iteration space for a repetitive process may be stored in a descriptor table, and status information for all partitions of a repetitive process stored in a status table. Each processor core may have an associated local table that tracks iteration execution of each task, and is synchronized with the status table.Type: GrantFiled: March 30, 2015Date of Patent: November 22, 2016Assignee: QUALCOMM IncorporatedInventors: Behnam Robatmili, Shaizeen Dilawarhusen Aga, Dario Suarez Gracia, Arun Raman, Aravind Natarajan, Gheorghe Calin Cascaval, Pablo Montesinos Ortego, Han Zhao
-
Publication number: 20160292012Abstract: Embodiments include computing devices, systems, and methods for task-based handling of repetitive processes in parallel. At least one processor of the computing device, or a specialized hardware controller, may be configured to partition iterations of a repetitive process and assign the partitions to initialized tasks to be executed in parallel by a plurality of processor cores. Upon completing a task, remaining divisible partitions of the repetitive process of ongoing tasks may be subpartitioned and assigned to the ongoing task, and the completed task or a newly initialized task. Information about the iteration space for a repetitive process may be stored in a descriptor table, and status information for all partitions of a repetitive process stored in a status table. Each processor core may have an associated local table that tracks iteration execution of each task, and is synchronized with the status table.Type: ApplicationFiled: March 30, 2015Publication date: October 6, 2016Inventors: Behnam ROBATMILI, Shaizeen Dilawarhusen Aga, Dario Suarez Gracia, Arun Raman, Aravind Natarajan, Gheorghe Calin Cascaval, Pablo Montesinos Ortego, Han Zhao
-
Publication number: 20160216969Abstract: Systems and methods for adaptively managing registers in an instruction processor are disclosed. The system identifies one or more registers with inoperable cells. An operand manager identifies a set of operable cells within the one or more registers with inoperable cells and determines if a present instruction will use an operand that can be supported by the set of operable cells. When the set of operable cells can support the operand, the operand manager generates an assignment which is communicated to a register file manager.Type: ApplicationFiled: January 28, 2015Publication date: July 28, 2016Inventors: Dario Suarez Gracia, Behnam Robatmili
-
Publication number: 20160103612Abstract: Aspects include computing devices, systems, and methods for implementing monitoring communications between components and a memory hierarchy of a computing device. The computing device may determine at least one identifying factor for identifying execution of the processor-executable code. A communication between the components and the memory hierarchy of the computing device may be monitored for at least one communication factor of a same type as the at least one identifying factor. A determination whether a value of the at least one identifying factor matches a value of the at least one communication factor may be made. The computing device may determine that the processor-executable code is executed in response to determining that the value of the at least one identifying factor matches the value of the at least one communication factor.Type: ApplicationFiled: October 12, 2014Publication date: April 14, 2016Inventors: Mihai Christodorescu, Mastooreh Salajegheh, Dario Suarez Gracia
-
Publication number: 20160041852Abstract: Multi-processor computing device methods manage resource accesses by a signaling event manager signaling processor elements requesting access to a resource to wake up to access the resource when the resource is available or wait for an event when the resource is busy. Processor elements may enter a sleep state while awaiting access to the requested resource. When multiple elements are waiting for the resource, the processor element with a highest assigned priority is signaled to wake up when the resource is available without waking other elements. Priorities may be assigned to processor elements waiting for the resource based on a heuristic or parameter that may depend on a state of the computing device or the processor elements. A sleep duration may be estimated for a processor element waiting for a resource and the processor element may be removed from a scheduling queue or assigned another thread during the sleep duration.Type: ApplicationFiled: August 5, 2014Publication date: February 11, 2016Inventors: Dario Suarez Gracia, Han Zhao, Pablo Montesinos Ortego, Gheorghe Calin Cascaval, James Xenidis
-
Publication number: 20150358810Abstract: Methods, non-transitory processor-readable storage media, devices, and systems for improving user experience, energy consumption, and performance of a mobile device by automatically configuring applications.Type: ApplicationFiled: June 10, 2014Publication date: December 10, 2015Inventors: Hui Chao, Dario Suarez Gracia, Gheorghe Calin Cascaval
-
Publication number: 20150205720Abstract: Aspects include apparatuses, systems, and methods for hardware acceleration for inline caches in dynamic languages. An inline cache may be initialized for an instance of a dynamic software operation. A call of an initialized instance of the dynamic software operation may be executed by an inline cache hardware accelerator. The inline cache may be checked to determine that its data is current. When the data is current, the initialized instance of the dynamic software operation may be executed using the related inline cache data. When the data is not current, a new inline cache may be initialized for the instance of the dynamic software operation, including the not current data of a previously initialized instance of the dynamic software operation. The inline cache hardware accelerator may include an inline cache memory, a coprocessor, and/or a functional until one an inline cache pipeline connected to a processor pipeline.Type: ApplicationFiled: April 28, 2014Publication date: July 23, 2015Applicant: QUALCOMM IncorporatedInventors: Behnam Robatmili, Gheorghe Calin Cascaval, Madhukar Nagaraja Kedlay, Dario Suarez Gracia
-
Publication number: 20150205726Abstract: Aspects include a computing devices, systems, and methods for hardware acceleration for inline caches in dynamic languages. An inline cache may be initialized for an instance of a dynamic software operation. A call of an initialized instance of the dynamic software operation may be executed by an inline cache hardware accelerator. The inline cache may be checked to determine that its data is current. When the data is current, the initialized instance of the dynamic software operation may be executed using the related inline cache data. When the data is not current, a new inline cache may be initialized for the instance of the dynamic software operation, including the not current data of a previously initialized instance of the dynamic software operation. The inline cache hardware accelerator may include an inline cache memory, a coprocessor, and/or a functional until one an inline cache pipeline connected to a processor pipeline.Type: ApplicationFiled: April 28, 2014Publication date: July 23, 2015Applicant: QUALCOMM IncorporatedInventors: Behnam Robatmili, Gheorghe Calin Cascaval, Madhukar Nagaraja Kedlaya, Dario Suarez Gracia