Patents by Inventor Dario Suarez Gracia

Dario Suarez Gracia has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Proactive resource management for parallel work-stealing processing systems

Patent number: 10360063

Abstract: Various embodiments proactively balance workloads between a plurality of processing units of a multi-processor computing device by making work-stealing determinations based on operating state data. An embodiment method includes obtaining static characteristics data associated with each of a victim processor and one or more of a plurality of processing units that are ready to steal work items from the victim processor (work-ready processors), obtaining dynamic characteristics data for each of the processors, calculating priority values for each of the processors based on the obtained data, and transferring a number of work items assigned to the victim processor to a winning work-ready processor based on the calculated priority values. In some embodiments, the method may include acquiring control over a probabilistic lock for a shared data structure and updating the shared data structure to indicate the number of work items transferred to the winning work-ready processor.

Type: Grant

Filed: September 23, 2015

Date of Patent: July 23, 2019

Assignee: QUALCOMM Incorporated

Inventors: Han Zhao, Dario Suárez Gracia, Tushar Kumar
Identifying enhanced synchronization operation outcomes to improve runtime operations

Patent number: 10114681

Abstract: Embodiments include computing devices, systems, and methods identifying enhanced synchronization operation outcomes. A computing device may receive a first resource access request for a first resource of a computing device including a first requester identifier from a first computing element of the computing device. The computing device may also receive a second resource access request for the first resource including a second requester identifier from a second computing element of the computing device. The computing device may grant the first computing element access to the first resource based on the first resource access request, and return a response to the second computing element including the first requester identifier as a winner computing element identifier.

Type: Grant

Filed: March 30, 2016

Date of Patent: October 30, 2018

Assignee: QUALCOMM Incorporated

Inventors: Dario Suarez Gracia, Gheorghe Cascaval, Han Zhao, Tushar Kumar, Aravind Natarajan, Arun Raman
Random-access disjoint concurrent sparse writes to heterogeneous buffers

Patent number: 10031697

Abstract: Methods, devices, and non-transitory processor-readable storage media for a computing device to merge concurrent writes from a plurality of processing units to a buffer associated with an application. An embodiment method executed by a processor may include identifying a plurality of concurrent requests to access the buffer that are sparse, disjoint, and write-only, configuring a write-set for each of the plurality of processing units, executing the plurality of concurrent requests to access the buffer using the write-sets, determining whether each of the plurality of concurrent requests to access the buffer is complete, obtaining a buffer index and data via the write-set of each of the plurality of processing units, and writing to the buffer using the received buffer index and data via the write-set of each of the plurality of processing units in response to determining that each of the plurality of concurrent requests to access the buffer is complete.

Type: Grant

Filed: January 19, 2016

Date of Patent: July 24, 2018

Assignee: QUALCOMM Incorporated

Inventors: Tushar Kumar, Aravind Natarajan, Dario Suarez Gracia
Identifying Enhanced Synchronization Operation Outcomes to Improve Runtime Operations

Publication number: 20170286182

Abstract: Embodiments include computing devices, systems, and methods identifying enhanced synchronization operation outcomes. A computing device may receive a first resource access request for a first resource of a computing device including a first requester identifier from a first computing element of the computing device. The computing device may also receive a second resource access request for the first resource including a second requester identifier from a second computing element of the computing device. The computing device may grant the first computing element access to the first resource based on the first resource access request, and return a response to the second computing element including the first requester identifier as a winner computing element identifier.

Type: Application

Filed: March 30, 2016

Publication date: October 5, 2017

Inventors: Dario Suarez Gracia, Gheorghe Cascaval, Han Zhao, Tushar Kumar, Aravind Natarajan, Arun Raman
Hardware acceleration for inline caches in dynamic languages

Patent number: 9740504

Abstract: Aspects include apparatuses, systems, and methods for hardware acceleration for inline caches in dynamic languages. An inline cache may be initialized for an instance of a dynamic software operation. A call of an initialized instance of the dynamic software operation may be executed by an inline cache hardware accelerator. The inline cache may be checked to determine that its data is current. When the data is current, the initialized instance of the dynamic software operation may be executed using the related inline cache data. When the data is not current, a new inline cache may be initialized for the instance of the dynamic software operation, including the not current data of a previously initialized instance of the dynamic software operation. The inline cache hardware accelerator may include an inline cache memory, a coprocessor, and/or a functional until one an inline cache pipeline connected to a processor pipeline.

Type: Grant

Filed: April 28, 2014

Date of Patent: August 22, 2017

Assignee: QUALCOMM Incorporated

Inventors: Behnam Robatmili, Gheorghe Calin Cascaval, Madhukar Nagaraja Kedlaya, Dario Suarez Gracia
Data management for multiple processing units using data transfer costs

Patent number: 9733978

Abstract: Various embodiments include methods for data management in a computing device utilizing a plurality of processing units. Embodiment methods may include generating a data transfer heuristic model based on measurements from a plurality of sample data transfers between a plurality of data storage units. The generated data transfer heuristic model may be used to calculate data transfer costs for each of a plurality of tasks. The calculated data transfer costs may be used to schedule execution of the plurality of tasks in an execution order on selected ones of the plurality of processing units. The data transfer heuristic model may be updated based on measurements of data transfers occurring during the executions of the plurality of tasks (e.g., time, power consumption, etc.). Code executing on the processing units may indicate to a runtime when certain data blocks are no longer needed and thus may be evicted and/or pre-fetched for others.

Type: Grant

Filed: August 27, 2015

Date of Patent: August 15, 2017

Assignee: QUALCOMM Incorporated

Inventors: Dario Suarez Gracia, Tushar Kumar, Aravind Natarajan, Ravish Hastantram, Gheorghe Calin Cascaval, Han Zhao
Random-Access Disjoint Concurrent Sparse Writes to Heterogeneous Buffers

Publication number: 20170206035

Abstract: Methods, devices, and non-transitory processor-readable storage media for a computing device to merge concurrent writes from a plurality of processing units to a buffer associated with an application. An embodiment method executed by a processor may include identifying a plurality of concurrent requests to access the buffer that are sparse, disjoint, and write-only, configuring a write-set for each of the plurality of processing units, executing the plurality of concurrent requests to access the buffer using the write-sets, determining whether each of the plurality of concurrent requests to access the buffer is complete, obtaining a buffer index and data via the write-set of each of the plurality of processing units, and writing to the buffer using the received buffer index and data via the write-set of each of the plurality of processing units in response to determining that each of the plurality of concurrent requests to access the buffer is complete.

Type: Application

Filed: January 19, 2016

Publication date: July 20, 2017

Inventors: Tushar Kumar, Aravind Natarajan, Dario Suarez Gracia
Hardware acceleration for inline caches in dynamic languages

Patent number: 9710388

Abstract: Aspects include a computing devices, systems, and methods for hardware acceleration for inline caches in dynamic languages. An inline cache may be initialized for an instance of a dynamic software operation. A call of an initialized instance of the dynamic software operation may be executed by an inline cache hardware accelerator. The inline cache may be checked to determine that its data is current. When the data is current, the initialized instance of the dynamic software operation may be executed using the related inline cache data. When the data is not current, a new inline cache may be initialized for the instance of the dynamic software operation, including the not current data of a previously initialized instance of the dynamic software operation. The inline cache hardware accelerator may include an inline cache memory, a coprocessor, and/or a functional until one an inline cache pipeline connected to a processor pipeline.

Type: Grant

Filed: April 28, 2014

Date of Patent: July 18, 2017

Assignee: QUALCOMM Incorporated

Inventors: Behnam Robatmili, Gheorghe Calin Cascaval, Madhukar Nagaraja Kedlaya, Dario Suarez Gracia
Directed event signaling for multiprocessor systems

Patent number: 9632569

Abstract: Multi-processor computing device methods manage resource accesses by a signaling event manager signaling processor elements requesting access to a resource to wake up to access the resource when the resource is available or wait for an event when the resource is busy. Processor elements may enter a sleep state while awaiting access to the requested resource. When multiple elements are waiting for the resource, the processor element with a highest assigned priority is signaled to wake up when the resource is available without waking other elements. Priorities may be assigned to processor elements waiting for the resource based on a heuristic or parameter that may depend on a state of the computing device or the processor elements. A sleep duration may be estimated for a processor element waiting for a resource and the processor element may be removed from a scheduling queue or assigned another thread during the sleep duration.

Type: Grant

Filed: August 5, 2014

Date of Patent: April 25, 2017

Assignee: QUALCOMM Incorporated

Inventors: Dario Suarez Gracia, Han Zhao, Pablo Montesinos Ortego, Gheorghe Calin Cascaval, James Xenidis
Data-Driven Accelerator For Machine Learning And Raw Data Analysis

Publication number: 20170083827

Abstract: Embodiments include computing devices, apparatus, and methods implemented by the apparatus for accelerating machine learning on a computing device. Raw data may be received in the computing device from a raw data source device. The apparatus may identify key features as two dimensional matrices of the raw data such that the key features are mutually exclusive from each other. The key features may be translated into key feature vectors. The computing device may generate a feature vector from at least one of the key feature vectors. The computing device may receive a first partial output resulting from an execution of a basic linear algebra subprogram (BLAS) operation using the feature vector and a weight factor. The first partial output may be combined with a plurality of partial outputs to produce an output matrix. Receiving the raw data on the computing device may include receiving streaming raw data.

Type: Application

Filed: September 23, 2015

Publication date: March 23, 2017

Inventors: Behnam Robatmili, Matthew Leslie Badin, Dario Suárez Gracia, Gheorghe Calin Cascaval, Nayeem Islam
Proactive Resource Management for Parallel Work-Stealing Processing Systems

Publication number: 20170083364

Abstract: Various embodiments proactively balance workloads between a plurality of processing units of a multi-processor computing device by making work-stealing determinations based on operating state data. An embodiment method includes obtaining static characteristics data associated with each of a victim processor and one or more of a plurality of processing units that are ready to steal work items from the victim processor (work-ready processors), obtaining dynamic characteristics data for each of the processors, calculating priority values for each of the processors based on the obtained data, and transferring a number of work items assigned to the victim processor to a winning work-ready processor based on the calculated priority values. In some embodiments, the method may include acquiring control over a probabilistic lock for a shared data structure and updating the shared data structure to indicate the number of work items transferred to the winning work-ready processor.

Type: Application

Filed: September 23, 2015

Publication date: March 23, 2017

Inventors: Han Zhao, Dario Suárez Gracia, Tushar Kumar
Data Management for Multiple Processing Units Using Data Transfer Costs

Publication number: 20170060633

Abstract: Various embodiments include methods for data management in a computing device utilizing a plurality of processing units. Embodiment methods may include generating a data transfer heuristic model based on measurements from a plurality of sample data transfers between a plurality of data storage units. The generated data transfer heuristic model may be used to calculate data transfer costs for each of a plurality of tasks. The calculated data transfer costs may be used to schedule execution of the plurality of tasks in an execution order on selected ones of the plurality of processing units. The data transfer heuristic model may be updated based on measurements of data transfers occurring during the executions of the plurality of tasks (e.g., time, power consumption, etc.). Code executing on the processing units may indicate to a runtime when certain data blocks are no longer needed and thus may be evicted and/or pre-fetched for others.

Type: Application

Filed: August 27, 2015

Publication date: March 2, 2017

Inventors: Dario Suarez Gracia, Tushar Kumar, Aravind Natarajan, Ravish Hastantram, Gheorghe Calin Cascaval, Han Zhao
Method for exploiting parallelism in task-based systems using an iteration space splitter

Patent number: 9501328

Abstract: Embodiments include computing devices, systems, and methods for task-based handling of repetitive processes in parallel. At least one processor of the computing device, or a specialized hardware controller, may be configured to partition iterations of a repetitive process and assign the partitions to initialized tasks to be executed in parallel by a plurality of processor cores. Upon completing a task, remaining divisible partitions of the repetitive process of ongoing tasks may be subpartitioned and assigned to the ongoing task, and the completed task or a newly initialized task. Information about the iteration space for a repetitive process may be stored in a descriptor table, and status information for all partitions of a repetitive process stored in a status table. Each processor core may have an associated local table that tracks iteration execution of each task, and is synchronized with the status table.

Type: Grant

Filed: March 30, 2015

Date of Patent: November 22, 2016

Assignee: QUALCOMM Incorporated

Inventors: Behnam Robatmili, Shaizeen Dilawarhusen Aga, Dario Suarez Gracia, Arun Raman, Aravind Natarajan, Gheorghe Calin Cascaval, Pablo Montesinos Ortego, Han Zhao
METHOD FOR EXPLOITING PARALLELISM IN TASK-BASED SYSTEMS USING AN ITERATION SPACE SPLITTER

Publication number: 20160292012

Abstract: Embodiments include computing devices, systems, and methods for task-based handling of repetitive processes in parallel. At least one processor of the computing device, or a specialized hardware controller, may be configured to partition iterations of a repetitive process and assign the partitions to initialized tasks to be executed in parallel by a plurality of processor cores. Upon completing a task, remaining divisible partitions of the repetitive process of ongoing tasks may be subpartitioned and assigned to the ongoing task, and the completed task or a newly initialized task. Information about the iteration space for a repetitive process may be stored in a descriptor table, and status information for all partitions of a repetitive process stored in a status table. Each processor core may have an associated local table that tracks iteration execution of each task, and is synchronized with the status table.

Type: Application

Filed: March 30, 2015

Publication date: October 6, 2016

Inventors: Behnam ROBATMILI, Shaizeen Dilawarhusen Aga, Dario Suarez Gracia, Arun Raman, Aravind Natarajan, Gheorghe Calin Cascaval, Pablo Montesinos Ortego, Han Zhao
SYSTEM AND METHOD FOR ADAPTIVELY MANAGING REGISTERS IN AN INSTRUCTION PROCESSOR

Publication number: 20160216969

Abstract: Systems and methods for adaptively managing registers in an instruction processor are disclosed. The system identifies one or more registers with inoperable cells. An operand manager identifies a set of operable cells within the one or more registers with inoperable cells and determines if a present instruction will use an operand that can be supported by the set of operable cells. When the set of operable cells can support the operand, the operand manager generates an assignment which is communicated to a register file manager.

Type: Application

Filed: January 28, 2015

Publication date: July 28, 2016

Inventors: Dario Suarez Gracia, Behnam Robatmili
Approximation of Execution Events Using Memory Hierarchy Monitoring

Publication number: 20160103612

Abstract: Aspects include computing devices, systems, and methods for implementing monitoring communications between components and a memory hierarchy of a computing device. The computing device may determine at least one identifying factor for identifying execution of the processor-executable code. A communication between the components and the memory hierarchy of the computing device may be monitored for at least one communication factor of a same type as the at least one identifying factor. A determination whether a value of the at least one identifying factor matches a value of the at least one communication factor may be made. The computing device may determine that the processor-executable code is executed in response to determining that the value of the at least one identifying factor matches the value of the at least one communication factor.

Type: Application

Filed: October 12, 2014

Publication date: April 14, 2016

Inventors: Mihai Christodorescu, Mastooreh Salajegheh, Dario Suarez Gracia
Directed Event Signaling For Multiprocessor Systems

Publication number: 20160041852

Abstract: Multi-processor computing device methods manage resource accesses by a signaling event manager signaling processor elements requesting access to a resource to wake up to access the resource when the resource is available or wait for an event when the resource is busy. Processor elements may enter a sleep state while awaiting access to the requested resource. When multiple elements are waiting for the resource, the processor element with a highest assigned priority is signaled to wake up when the resource is available without waking other elements. Priorities may be assigned to processor elements waiting for the resource based on a heuristic or parameter that may depend on a state of the computing device or the processor elements. A sleep duration may be estimated for a processor element waiting for a resource and the processor element may be removed from a scheduling queue or assigned another thread during the sleep duration.

Type: Application

Filed: August 5, 2014

Publication date: February 11, 2016

Inventors: Dario Suarez Gracia, Han Zhao, Pablo Montesinos Ortego, Gheorghe Calin Cascaval, James Xenidis
Software Configurations for Mobile Devices in a Collaborative Environment

Publication number: 20150358810

Abstract: Methods, non-transitory processor-readable storage media, devices, and systems for improving user experience, energy consumption, and performance of a mobile device by automatically configuring applications.

Type: Application

Filed: June 10, 2014

Publication date: December 10, 2015

Inventors: Hui Chao, Dario Suarez Gracia, Gheorghe Calin Cascaval
Hardware Acceleration For Inline Caches In Dynamic Languages

Publication number: 20150205720

Abstract: Aspects include apparatuses, systems, and methods for hardware acceleration for inline caches in dynamic languages. An inline cache may be initialized for an instance of a dynamic software operation. A call of an initialized instance of the dynamic software operation may be executed by an inline cache hardware accelerator. The inline cache may be checked to determine that its data is current. When the data is current, the initialized instance of the dynamic software operation may be executed using the related inline cache data. When the data is not current, a new inline cache may be initialized for the instance of the dynamic software operation, including the not current data of a previously initialized instance of the dynamic software operation. The inline cache hardware accelerator may include an inline cache memory, a coprocessor, and/or a functional until one an inline cache pipeline connected to a processor pipeline.

Type: Application

Filed: April 28, 2014

Publication date: July 23, 2015

Applicant: QUALCOMM Incorporated

Inventors: Behnam Robatmili, Gheorghe Calin Cascaval, Madhukar Nagaraja Kedlay, Dario Suarez Gracia
Hardware Acceleration for Inline Caches in Dynamic Languages

Publication number: 20150205726

Abstract: Aspects include a computing devices, systems, and methods for hardware acceleration for inline caches in dynamic languages. An inline cache may be initialized for an instance of a dynamic software operation. A call of an initialized instance of the dynamic software operation may be executed by an inline cache hardware accelerator. The inline cache may be checked to determine that its data is current. When the data is current, the initialized instance of the dynamic software operation may be executed using the related inline cache data. When the data is not current, a new inline cache may be initialized for the instance of the dynamic software operation, including the not current data of a previously initialized instance of the dynamic software operation. The inline cache hardware accelerator may include an inline cache memory, a coprocessor, and/or a functional until one an inline cache pipeline connected to a processor pipeline.

Type: Application

Filed: April 28, 2014

Publication date: July 23, 2015

Applicant: QUALCOMM Incorporated

Inventors: Behnam Robatmili, Gheorghe Calin Cascaval, Madhukar Nagaraja Kedlaya, Dario Suarez Gracia