Patents by Inventor Mauricio Breternitz

Mauricio Breternitz has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8935472
    Abstract: A data processing device is provided that includes an array of working memory banks and an associated processing engine. The working memory bank array is configured with at least one independently activatable memory bank. A dirty data counter (DDC) is associated with the independently activatable memory bank and is configured to reflect a count of dirty data migrated from the independently activatable memory bank upon selective deactivation of the independently activatable memory bank. The DDC is configured to selectively decrement the count of dirty data upon the reactivation of the independently activatable memory bank in connection with a transient state. In the transient state, each dirty data access by the processing engine to the reactivated memory bank is also conducted with respect to another memory bank of the array. Upon a condition that dirty data is found in the other memory bank, the count of dirty data is decremented.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: January 13, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mithuna Thottethodi, Gabriel Loh, Mauricio Breternitz, James O'Connor, Yasuko Eckert
  • Patent number: 8929220
    Abstract: In a processing system comprising a plurality of processing nodes coupled via a switching fabric, a method includes implementing a flow control property for a data flow in the switching fabric based on an addressing property of an address of a virtual network interface controller associated with the data flow. A switching fabric includes a plurality of ports, each port coupleable to a corresponding processing node, and switching logic coupled to the plurality of ports. The switching fabric further includes flow control logic to implement a flow control property for a data flow in the switching logic based on an addressing property of an address of a virtual network interface controller associated with the data flow.
    Type: Grant
    Filed: August 24, 2012
    Date of Patent: January 6, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mauricio Breternitz, Jr., Anton Chernoff, Mark D. Hummel
  • Publication number: 20140372782
    Abstract: Various datacenter or other computing center control apparatus and methods are disclosed. In one aspect, a method of computing is provided that includes defining plural processor performance bins where each processor performance bin has a processor performance state. At least one processor is assigned to each of the plural processor performance bins. Processor performance metrics of at least one of the processors are monitored while the at least one of the processors executes an incoming task. Processor power is modeled based on the monitored performance metrics. Future incoming tasks are assigned to one of the processor performance bins based on the modeled processor power.
    Type: Application
    Filed: June 13, 2013
    Publication date: December 18, 2014
    Inventors: Mauricio Breternitz, Leonardo Piga, Patryk Kaminski
  • Publication number: 20140359126
    Abstract: A method of computing is performed in a first processing node of a plurality of processing nodes of multiple types with distinct processing capabilities. The method includes, in response to a command, partitioning data associated with the command among the plurality of processing nodes. The data is partitioned based at least in part on the distinct processing capabilities of the multiple types of processing nodes.
    Type: Application
    Filed: June 3, 2013
    Publication date: December 4, 2014
    Inventors: Mauricio Breternitz, Gary Frost
  • Publication number: 20140359633
    Abstract: A method is performed in a computing system that includes a plurality of processing nodes of multiple types configurable to run in multiple performance states. In the method, an application executes on a thread assigned to a first processing node. Power and performance of the application on the first processing node is estimated. Power and performance of the application in multiple performance states on other processing nodes of the plurality of processing nodes besides the first processing node is also estimated. It is determined that the estimated power and performance of the application on a second processing node in a respective performance state of the multiple performance states is preferable to the power and performance of the application on the first processing node. The thread is reassigned to the second processing node, with the second processing node in the respective performance state.
    Type: Application
    Filed: June 4, 2013
    Publication date: December 4, 2014
    Inventors: Mauricio Breternitz, Leonardo Piga
  • Publication number: 20140333638
    Abstract: An approach and a method for efficient execution of nested map-reduce framework workloads to take advantage of the combined execution of central processing units (CPUs) and graphics processing units (GPUs) and lower latency of data access in accelerated processing units (APUs) is described. In embodiments, metrics are generated to determine whether a map or reduce function is more efficiently processed on a CPU or a GPU. A first metric is based on ratio of a number of branch instructions to a number of non-branch instructions, and a second metric is based on the comparison of execution times on each of the CPU and the GPU. Selecting execution of map and reduce functions based on the first and second metrics result in accelerated computations. Some embodiments include scheduling pipelined executions of functions on the CPU and functions on the GPU concurrently to achieve power-efficient nested map reduce framework execution.
    Type: Application
    Filed: May 9, 2013
    Publication date: November 13, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Patryk KAMINSKI, Mauricio Breternitz, Gary R. Frost, Christophe Harle
  • Patent number: 8887056
    Abstract: The present disclosure relates to a method, system, and apparatus for configuring a computing system, such as a cloud computing system. A method includes, based on user selections received via a user interface, configuring a cluster of nodes by selecting the cluster of nodes from a plurality of available nodes, selecting a workload container module from a plurality of available workload container modules for operation on each node of the selected cluster of nodes, and selecting a workload for execution with the workload container on the cluster of nodes. Each node of the cluster of nodes includes at least one processing device and memory, and the cluster of nodes is operative to share processing of a workload.
    Type: Grant
    Filed: August 7, 2012
    Date of Patent: November 11, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mauricio Breternitz, Keith A. Lowery, Patryk Kaminski, Anton Chernoff
  • Publication number: 20140281246
    Abstract: A system, processor, and method to predict with high accuracy and retain instruction boundaries for previously executed instructions in order to decode variable length instructions is disclosed. In at least one embodiment, a disclosed processor includes an instruction fetch unit, an instruction cache, a boundary byte predictor, and an instruction decoder. In some embodiments, the instruction fetch unit provides an instruction address and the instruction cache produces an instruction tag and instruction cache content corresponding to the instruction address. The instruction decoder, in some embodiments, includes boundary byte logic to determine an instruction boundary in the instruction cache content.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Inventors: Mauricio Breternitz, JR., Youfeng Wu, Peter Sassone, James Mason, Aashish Phansalkar, Balaji Vijayan
  • Publication number: 20140258688
    Abstract: Methods and systems are provided for generating a benchmark representative of a reference process. One method involves obtaining execution information for a subset of the plurality of instructions of the reference process from a pipeline of a processing module during execution of those instructions by the processing module, determining performance characteristics quantifying the execution behavior of the reference process based on the execution information, and generating the benchmark process that mimics the quantified execution behavior of the reference process based on the performance characteristics.
    Type: Application
    Filed: March 7, 2013
    Publication date: September 11, 2014
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Mauricio Breternitz, Anton Chernoff, Keith A. Lowery
  • Publication number: 20140223445
    Abstract: The described embodiments comprise a selection mechanism that selects a resource from a set of resources in a computing device for performing an operation. In some embodiments, the selection mechanism is configured to perform a lookup in a table selected from a set of tables to identify a resource from the set of resources. When the identified resource is not available for performing the operation and until a resource is selected for performing the operation, the selection mechanism is configured to identify a next resource in the table and select the next resource for performing the operation when the next resource is available for performing the operation.
    Type: Application
    Filed: February 7, 2013
    Publication date: August 7, 2014
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Bradford M. Beckmann, Mithuna S. Thottethodi, James M. O'Connor, Mauricio Breternitz, Lisa R. Hsu, Gabriel H. Loh, Yasuko Eckert
  • Patent number: 8782645
    Abstract: A system and method for efficient automatic scheduling of the execution of work units between multiple heterogeneous processor cores. A processing node includes a first processor core with a general-purpose micro-architecture and a second processor core with a single instruction multiple data micro-architecture. A computer program comprises one or more compute kernels, or function calls. A compiler computes pre-runtime information of the given function call. A runtime scheduler produces one or more work units by matching each of the one or more kernels with an associated record of data. The scheduler assigns work units either to the first or to the second processor core based at least in part on the computed pre-runtime information. In addition, the scheduler is able to change an original assignment for a waiting work unit based on dynamic runtime behavior of other work units corresponding to a same kernel as the waiting work unit.
    Type: Grant
    Filed: May 11, 2011
    Date of Patent: July 15, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery, Anton Chernoff
  • Publication number: 20140181414
    Abstract: A system and method for efficiently limiting storage space for data with particular properties in a cache memory. A computing system includes a cache array and a corresponding cache controller. The cache array includes multiple banks, wherein a first bank is powered down. In response a write request to a second bank for data indicated to be stored in the powered down first bank, the cache controller determines a respective bypass condition for the data. If the bypass condition exceeds a threshold, then the cache controller invalidates any copy of the data stored in the second bank. If the bypass condition does not exceed the threshold, then the cache controller stores the data with a clean state in the second bank. The cache controller writes the data in a lower-level memory for both cases.
    Type: Application
    Filed: October 16, 2013
    Publication date: June 26, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Yasuko Eckert, Gabriel H. Loh, Mauricio Breternitz, James M. O'Connor, Srilatha Manne, Nuwan S. Jayasena, Mithuna S. Thottethodi
  • Publication number: 20140181411
    Abstract: A data processing device is provided that includes an array of working memory banks and an associated processing engine. The working memory bank array is configured with at least one independently activatable memory bank. A dirty data counter (DDC) is associated with the independently activatable memory bank and is configured to reflect a count of dirty data migrated from the independently activatable memory bank upon selective deactivation of the independently activatable memory bank. The DDC is configured to selectively decrement the count of dirty data upon the reactivation of the independently activatable memory bank in connection with a transient state. In the transient state, each dirty data access by the processing engine to the reactivated memory bank is also conducted with respect to another memory bank of the array. Upon a condition that dirty data is found in the other memory bank, the count of dirty data is decremented.
    Type: Application
    Filed: December 21, 2012
    Publication date: June 26, 2014
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Mithuna Thottethodi, Gabriel Loh, Mauricio Breternitz, James O'Connor, Yasuko Eckert
  • Publication number: 20140164708
    Abstract: A processor discards spill data from a memory hierarchy in response to the final access to the spill data has been performed by a compiled program executing at the processor. In some embodiments, the final access determined based on a special-purpose load instruction configured for this purpose. In some embodiments the determination is made based on the location of a stack pointer indicating that a method of the executing program has returned, so that data of the returned method that remains in the stack frame is no longer to be accessed. Because the spill data is discarded after the final access, it is not transferred through the memory hierarchy.
    Type: Application
    Filed: December 7, 2012
    Publication date: June 12, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Mauricio Breternitz, JR., James M. O'Connor, Srilatha Manne, Yasuko Eckert
  • Publication number: 20140156941
    Abstract: The described embodiments include a cache with a plurality of banks that includes a cache controller. In these embodiments, the cache controller determines a value representing non-native cache blocks stored in at least one bank in the cache, wherein a cache block is non-native to a bank when a home for the cache block is in a predetermined location relative to the bank. Then, based on the value representing non-native cache blocks stored in the at least one bank, the cache controller determines at least one bank in the cache to be transitioned from a first power mode to a second power mode. Next, the cache controller transitions the determined at least one bank in the cache from the first power mode to the second power mode.
    Type: Application
    Filed: November 30, 2012
    Publication date: June 5, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Gabriel H. Loh, Mithuna S. Thottehodi, Yasuko Eckert, James M. O'Connor, Mauricio Breternitz, Bradford M. Beckmann, Nuwan Jayasena
  • Patent number: 8738877
    Abstract: Improved memory management in a processor is provided using garbage collection utilities. The processor includes higher performance memory units and lower performance memory units and a memory management unit. The memory management unit includes a garbage collection utility programmed to identify high use memory blocks and low use memory blocks within the higher and lower performance memory units. The memory management unit is also configured to move the high use memory blocks to higher performance memory and move the low use memory blocks to lower performance memory. The method comprises determining performance characteristics of available memory to identify higher performance memory and lower performance memory. Next memory block use metrics are analyzed to identify high use memory blocks and low use memory blocks. Finally, high use memory blocks are moved to the higher performance memory while the low use memory blocks are moved to the lower performance memory.
    Type: Grant
    Filed: December 14, 2011
    Date of Patent: May 27, 2014
    Assignee: Advance Micro Devices, Inc.
    Inventors: Gabriel H. Loh, Mauricio Breternitz
  • Publication number: 20140136870
    Abstract: A device receives an indication that a memory bank is to be powered down, and determines, based on receiving the indication, shutdown scores corresponding to powered up memory banks. Each shutdown score is based on a shutdown metric associated with powering down a powered up memory bank. The device may power down a selected memory bank based on the shutdown scores.
    Type: Application
    Filed: November 14, 2012
    Publication date: May 15, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Mauricio BRETERNITZ, James M. O'CONNOR, Gabriel H. LOH, Yasuko ECKERT, Mithuna THOTTETHODI, Srilatha MANNE, Bradford M. BECKMANN
  • Publication number: 20140136873
    Abstract: A device receives an indication that a memory bank is to be powered up, and determines, based on receiving the indication, power scores corresponding to powered down memory banks. Each power score corresponds to a power metric associated with powering up a powered down memory bank. The device powers up a selected memory bank based on the plurality of power scores.
    Type: Application
    Filed: November 14, 2012
    Publication date: May 15, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Mauricio BRETERNITZ, James M. O'CONNOR, Gabriel H. LOH, Yasuko ECKERT, Mithuna THOTTETHODI, Srilatha MANNE, Bradford M. BECKMANN
  • Publication number: 20140108828
    Abstract: A device may receive information that identifies a first task to be processed, may determine a performance metric value indicative of a behavior of a processor while processing a second task, and may assign, based on the performance metric value, the first task to a bin for processing the first task, the bin including a set of processors that operate based on a power characteristic.
    Type: Application
    Filed: October 15, 2012
    Publication date: April 17, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Mauricio BRETERNITZ, Leonardo Piga
  • Patent number: 8683468
    Abstract: A system and method for automatically migrating the execution of work units between multiple heterogeneous cores. A computing system includes a first processor core with a single instruction multiple data micro-architecture and a second processor core with a general-purpose micro-architecture. A compiler predicts execution of a function call in a program migrates at a given location to a different processor core. The compiler creates a data structure to support moving live values associated with the execution of the function call at the given location. An operating system (OS) scheduler schedules at least code before the given location in program order to the first processor core. In response to receiving an indication that a condition for migration is satisfied, the OS scheduler moves the live values to a location indicated by the data structure for access by the second processor core and schedules code after the given location to the second processor core.
    Type: Grant
    Filed: May 16, 2011
    Date of Patent: March 25, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery, Anton Chernoff, Dz-Ching Ju