Patents by Inventor Mauricio Breternitz

Mauricio Breternitz has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Processing device with independently activatable working memory bank and methods

Patent number: 8935472

Abstract: A data processing device is provided that includes an array of working memory banks and an associated processing engine. The working memory bank array is configured with at least one independently activatable memory bank. A dirty data counter (DDC) is associated with the independently activatable memory bank and is configured to reflect a count of dirty data migrated from the independently activatable memory bank upon selective deactivation of the independently activatable memory bank. The DDC is configured to selectively decrement the count of dirty data upon the reactivation of the independently activatable memory bank in connection with a transient state. In the transient state, each dirty data access by the processing engine to the reactivated memory bank is also conducted with respect to another memory bank of the array. Upon a condition that dirty data is found in the other memory bank, the count of dirty data is decremented.

Type: Grant

Filed: December 21, 2012

Date of Patent: January 13, 2015

Assignee: Advanced Micro Devices, Inc.

Inventors: Mithuna Thottethodi, Gabriel Loh, Mauricio Breternitz, James O'Connor, Yasuko Eckert
Processing system using virtual network interface controller addressing as flow control metadata

Patent number: 8929220

Abstract: In a processing system comprising a plurality of processing nodes coupled via a switching fabric, a method includes implementing a flow control property for a data flow in the switching fabric based on an addressing property of an address of a virtual network interface controller associated with the data flow. A switching fabric includes a plurality of ports, each port coupleable to a corresponding processing node, and switching logic coupled to the plurality of ports. The switching fabric further includes flow control logic to implement a flow control property for a data flow in the switching logic based on an addressing property of an address of a virtual network interface controller associated with the data flow.

Type: Grant

Filed: August 24, 2012

Date of Patent: January 6, 2015

Assignee: Advanced Micro Devices, Inc.

Inventors: Mauricio Breternitz, Jr., Anton Chernoff, Mark D. Hummel
COMBINED DYNAMIC AND STATIC POWER AND PERFORMANCE OPTIMIZATION ON DATA CENTERS

Publication number: 20140372782

Abstract: Various datacenter or other computing center control apparatus and methods are disclosed. In one aspect, a method of computing is provided that includes defining plural processor performance bins where each processor performance bin has a processor performance state. At least one processor is assigned to each of the plural processor performance bins. Processor performance metrics of at least one of the processors are monitored while the at least one of the processors executes an incoming task. Processor power is modeled based on the monitored performance metrics. Future incoming tasks are assigned to one of the processor performance bins based on the modeled processor power.

Type: Application

Filed: June 13, 2013

Publication date: December 18, 2014

Inventors: Mauricio Breternitz, Leonardo Piga, Patryk Kaminski
WORKLOAD PARTITIONING AMONG HETEROGENEOUS PROCESSING NODES

Publication number: 20140359126

Abstract: A method of computing is performed in a first processing node of a plurality of processing nodes of multiple types with distinct processing capabilities. The method includes, in response to a command, partitioning data associated with the command among the plurality of processing nodes. The data is partitioned based at least in part on the distinct processing capabilities of the multiple types of processing nodes.

Type: Application

Filed: June 3, 2013

Publication date: December 4, 2014

Inventors: Mauricio Breternitz, Gary Frost
THREAD ASSIGNMENT FOR POWER AND PERFORMANCE EFFICIENCY USING MULTIPLE POWER STATES

Publication number: 20140359633

Abstract: A method is performed in a computing system that includes a plurality of processing nodes of multiple types configurable to run in multiple performance states. In the method, an application executes on a thread assigned to a first processing node. Power and performance of the application on the first processing node is estimated. Power and performance of the application in multiple performance states on other processing nodes of the plurality of processing nodes besides the first processing node is also estimated. It is determined that the estimated power and performance of the application on a second processing node in a respective performance state of the multiple performance states is preferable to the power and performance of the application on the first processing node. The thread is reassigned to the second processing node, with the second processing node in the respective performance state.

Type: Application

Filed: June 4, 2013

Publication date: December 4, 2014

Inventors: Mauricio Breternitz, Leonardo Piga
POWER-EFFICIENT NESTED MAP-REDUCE EXECUTION ON A CLOUD OF HETEROGENEOUS ACCELERATED PROCESSING UNITS

Publication number: 20140333638

Abstract: An approach and a method for efficient execution of nested map-reduce framework workloads to take advantage of the combined execution of central processing units (CPUs) and graphics processing units (GPUs) and lower latency of data access in accelerated processing units (APUs) is described. In embodiments, metrics are generated to determine whether a map or reduce function is more efficiently processed on a CPU or a GPU. A first metric is based on ratio of a number of branch instructions to a number of non-branch instructions, and a second metric is based on the comparison of execution times on each of the CPU and the GPU. Selecting execution of map and reduce functions based on the first and second metrics result in accelerated computations. Some embodiments include scheduling pipelined executions of functions on the CPU and functions on the GPU concurrently to achieve power-efficient nested map reduce framework execution.

Type: Application

Filed: May 9, 2013

Publication date: November 13, 2014

Applicant: Advanced Micro Devices, Inc.

Inventors: Patryk KAMINSKI, Mauricio Breternitz, Gary R. Frost, Christophe Harle
System and method for configuring cloud computing systems

Patent number: 8887056

Abstract: The present disclosure relates to a method, system, and apparatus for configuring a computing system, such as a cloud computing system. A method includes, based on user selections received via a user interface, configuring a cluster of nodes by selecting the cluster of nodes from a plurality of available nodes, selecting a workload container module from a plurality of available workload container modules for operation on each node of the selected cluster of nodes, and selecting a workload for execution with the workload container on the cluster of nodes. Each node of the cluster of nodes includes at least one processing device and memory, and the cluster of nodes is operative to share processing of a workload.

Type: Grant

Filed: August 7, 2012

Date of Patent: November 11, 2014

Assignee: Advanced Micro Devices, Inc.

Inventors: Mauricio Breternitz, Keith A. Lowery, Patryk Kaminski, Anton Chernoff
INSTRUCTION BOUNDARY PREDICTION FOR VARIABLE LENGTH INSTRUCTION SET

Publication number: 20140281246

Abstract: A system, processor, and method to predict with high accuracy and retain instruction boundaries for previously executed instructions in order to decode variable length instructions is disclosed. In at least one embodiment, a disclosed processor includes an instruction fetch unit, an instruction cache, a boundary byte predictor, and an instruction decoder. In some embodiments, the instruction fetch unit provides an instruction address and the instruction cache produces an instruction tag and instruction cache content corresponding to the instruction address. The instruction decoder, in some embodiments, includes boundary byte logic to determine an instruction boundary in the instruction cache content.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Inventors: Mauricio Breternitz, JR., Youfeng Wu, Peter Sassone, James Mason, Aashish Phansalkar, Balaji Vijayan
BENCHMARK GENERATION USING INSTRUCTION EXECUTION INFORMATION

Publication number: 20140258688

Abstract: Methods and systems are provided for generating a benchmark representative of a reference process. One method involves obtaining execution information for a subset of the plurality of instructions of the reference process from a pipeline of a processing module during execution of those instructions by the processing module, determining performance characteristics quantifying the execution behavior of the reference process based on the execution information, and generating the benchmark process that mimics the quantified execution behavior of the reference process based on the performance characteristics.

Type: Application

Filed: March 7, 2013

Publication date: September 11, 2014

Applicant: ADVANCED MICRO DEVICES, INC.

Inventors: Mauricio Breternitz, Anton Chernoff, Keith A. Lowery
Selecting a Resource from a Set of Resources for Performing an Operation

Publication number: 20140223445

Abstract: The described embodiments comprise a selection mechanism that selects a resource from a set of resources in a computing device for performing an operation. In some embodiments, the selection mechanism is configured to perform a lookup in a table selected from a set of tables to identify a resource from the set of resources. When the identified resource is not available for performing the operation and until a resource is selected for performing the operation, the selection mechanism is configured to identify a next resource in the table and select the next resource for performing the operation when the next resource is available for performing the operation.

Type: Application

Filed: February 7, 2013

Publication date: August 7, 2014

Applicant: ADVANCED MICRO DEVICES, INC.

Inventors: Bradford M. Beckmann, Mithuna S. Thottethodi, James M. O'Connor, Mauricio Breternitz, Lisa R. Hsu, Gabriel H. Loh, Yasuko Eckert
Automatic load balancing for heterogeneous cores

Patent number: 8782645

Abstract: A system and method for efficient automatic scheduling of the execution of work units between multiple heterogeneous processor cores. A processing node includes a first processor core with a general-purpose micro-architecture and a second processor core with a single instruction multiple data micro-architecture. A computer program comprises one or more compute kernels, or function calls. A compiler computes pre-runtime information of the given function call. A runtime scheduler produces one or more work units by matching each of the one or more kernels with an associated record of data. The scheduler assigns work units either to the first or to the second processor core based at least in part on the computed pre-runtime information. In addition, the scheduler is able to change an original assignment for a waiting work unit based on dynamic runtime behavior of other work units corresponding to a same kernel as the waiting work unit.

Type: Grant

Filed: May 11, 2011

Date of Patent: July 15, 2014

Assignee: Advanced Micro Devices, Inc.

Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery, Anton Chernoff
MECHANISMS TO BOUND THE PRESENCE OF CACHE BLOCKS WITH SPECIFIC PROPERTIES IN CACHES

Publication number: 20140181414

Abstract: A system and method for efficiently limiting storage space for data with particular properties in a cache memory. A computing system includes a cache array and a corresponding cache controller. The cache array includes multiple banks, wherein a first bank is powered down. In response a write request to a second bank for data indicated to be stored in the powered down first bank, the cache controller determines a respective bypass condition for the data. If the bypass condition exceeds a threshold, then the cache controller invalidates any copy of the data stored in the second bank. If the bypass condition does not exceed the threshold, then the cache controller stores the data with a clean state in the second bank. The cache controller writes the data in a lower-level memory for both cases.

Type: Application

Filed: October 16, 2013

Publication date: June 26, 2014

Applicant: Advanced Micro Devices, Inc.

Inventors: Yasuko Eckert, Gabriel H. Loh, Mauricio Breternitz, James M. O'Connor, Srilatha Manne, Nuwan S. Jayasena, Mithuna S. Thottethodi
PROCESSING DEVICE WITH INDEPENDENTLY ACTIVATABLE WORKING MEMORY BANK AND METHODS

Publication number: 20140181411

Abstract: A data processing device is provided that includes an array of working memory banks and an associated processing engine. The working memory bank array is configured with at least one independently activatable memory bank. A dirty data counter (DDC) is associated with the independently activatable memory bank and is configured to reflect a count of dirty data migrated from the independently activatable memory bank upon selective deactivation of the independently activatable memory bank. The DDC is configured to selectively decrement the count of dirty data upon the reactivation of the independently activatable memory bank in connection with a transient state. In the transient state, each dirty data access by the processing engine to the reactivated memory bank is also conducted with respect to another memory bank of the array. Upon a condition that dirty data is found in the other memory bank, the count of dirty data is decremented.

Type: Application

Filed: December 21, 2012

Publication date: June 26, 2014

Applicant: ADVANCED MICRO DEVICES, INC.

Inventors: Mithuna Thottethodi, Gabriel Loh, Mauricio Breternitz, James O'Connor, Yasuko Eckert
SPILL DATA MANAGEMENT

Publication number: 20140164708

Abstract: A processor discards spill data from a memory hierarchy in response to the final access to the spill data has been performed by a compiled program executing at the processor. In some embodiments, the final access determined based on a special-purpose load instruction configured for this purpose. In some embodiments the determination is made based on the location of a stack pointer indicating that a method of the executing program has returned, so that data of the returned method that remains in the stack frame is no longer to be accessed. Because the spill data is discarded after the final access, it is not transferred through the memory hierarchy.

Type: Application

Filed: December 7, 2012

Publication date: June 12, 2014

Applicant: Advanced Micro Devices, Inc.

Inventors: Mauricio Breternitz, JR., James M. O'Connor, Srilatha Manne, Yasuko Eckert
Tracking Non-Native Content in Caches

Publication number: 20140156941

Abstract: The described embodiments include a cache with a plurality of banks that includes a cache controller. In these embodiments, the cache controller determines a value representing non-native cache blocks stored in at least one bank in the cache, wherein a cache block is non-native to a bank when a home for the cache block is in a predetermined location relative to the bank. Then, based on the value representing non-native cache blocks stored in the at least one bank, the cache controller determines at least one bank in the cache to be transitioned from a first power mode to a second power mode. Next, the cache controller transitions the determined at least one bank in the cache from the first power mode to the second power mode.

Type: Application

Filed: November 30, 2012

Publication date: June 5, 2014

Applicant: Advanced Micro Devices, Inc.

Inventors: Gabriel H. Loh, Mithuna S. Thottehodi, Yasuko Eckert, James M. O'Connor, Mauricio Breternitz, Bradford M. Beckmann, Nuwan Jayasena
Processor with garbage-collection based classification of memory

Patent number: 8738877

Abstract: Improved memory management in a processor is provided using garbage collection utilities. The processor includes higher performance memory units and lower performance memory units and a memory management unit. The memory management unit includes a garbage collection utility programmed to identify high use memory blocks and low use memory blocks within the higher and lower performance memory units. The memory management unit is also configured to move the high use memory blocks to higher performance memory and move the low use memory blocks to lower performance memory. The method comprises determining performance characteristics of available memory to identify higher performance memory and lower performance memory. Next memory block use metrics are analyzed to identify high use memory blocks and low use memory blocks. Finally, high use memory blocks are moved to the higher performance memory while the low use memory blocks are moved to the lower performance memory.

Type: Grant

Filed: December 14, 2011

Date of Patent: May 27, 2014

Assignee: Advance Micro Devices, Inc.

Inventors: Gabriel H. Loh, Mauricio Breternitz
TRACKING MEMORY BANK UTILITY AND COST FOR INTELLIGENT SHUTDOWN DECISIONS

Publication number: 20140136870

Abstract: A device receives an indication that a memory bank is to be powered down, and determines, based on receiving the indication, shutdown scores corresponding to powered up memory banks. Each shutdown score is based on a shutdown metric associated with powering down a powered up memory bank. The device may power down a selected memory bank based on the shutdown scores.

Type: Application

Filed: November 14, 2012

Publication date: May 15, 2014

Applicant: Advanced Micro Devices, Inc.

Inventors: Mauricio BRETERNITZ, James M. O'CONNOR, Gabriel H. LOH, Yasuko ECKERT, Mithuna THOTTETHODI, Srilatha MANNE, Bradford M. BECKMANN
TRACKING MEMORY BANK UTILITY AND COST FOR INTELLIGENT POWER UP DECISIONS

Publication number: 20140136873

Abstract: A device receives an indication that a memory bank is to be powered up, and determines, based on receiving the indication, power scores corresponding to powered down memory banks. Each power score corresponds to a power metric associated with powering up a powered down memory bank. The device powers up a selected memory bank based on the plurality of power scores.

Type: Application

Filed: November 14, 2012

Publication date: May 15, 2014

Applicant: Advanced Micro Devices, Inc.

Inventors: Mauricio BRETERNITZ, James M. O'CONNOR, Gabriel H. LOH, Yasuko ECKERT, Mithuna THOTTETHODI, Srilatha MANNE, Bradford M. BECKMANN
SEMI-STATIC POWER AND PERFORMANCE OPTIMIZATION OF DATA CENTERS

Publication number: 20140108828

Abstract: A device may receive information that identifies a first task to be processed, may determine a performance metric value indicative of a behavior of a processor while processing a second task, and may assign, based on the performance metric value, the first task to a bin for processing the first task, the bin including a set of processors that operate based on a power characteristic.

Type: Application

Filed: October 15, 2012

Publication date: April 17, 2014

Applicant: Advanced Micro Devices, Inc.

Inventors: Mauricio BRETERNITZ, Leonardo Piga
Automatic kernel migration for heterogeneous cores

Patent number: 8683468

Abstract: A system and method for automatically migrating the execution of work units between multiple heterogeneous cores. A computing system includes a first processor core with a single instruction multiple data micro-architecture and a second processor core with a general-purpose micro-architecture. A compiler predicts execution of a function call in a program migrates at a given location to a different processor core. The compiler creates a data structure to support moving live values associated with the execution of the function call at the given location. An operating system (OS) scheduler schedules at least code before the given location in program order to the first processor core. In response to receiving an indication that a condition for migration is satisfied, the OS scheduler moves the live values to a location indicated by the data structure for access by the second processor core and schedules code after the given location to the second processor core.

Type: Grant

Filed: May 16, 2011

Date of Patent: March 25, 2014

Assignee: Advanced Micro Devices, Inc.

Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery, Anton Chernoff, Dz-Ching Ju

prev 1 2 3 4 5 6 next