Patents by Inventor Michael Mantor

Michael Mantor has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8959319
    Abstract: Embodiments of the present invention provide systems, methods, and computer program products for improving divergent conditional branches in code being executed by a processor. For example, in an embodiment, a method comprises detecting a conditional statement of a program being simultaneously executed by a plurality of threads, determining which threads evaluate a condition of the conditional statement as true and which threads evaluate the condition as false, pushing an identifier associated with the larger set of the threads onto a stack, executing code associated with a smaller set of the threads, and executing code associated with the larger set of the threads.
    Type: Grant
    Filed: December 2, 2011
    Date of Patent: February 17, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mark Leather, Norman Rubin, Brian D. Emberling, Michael Mantor
  • Patent number: 8933942
    Abstract: Embodiments describe herein provide an apparatus, a computer readable medium and a method for simultaneously processing tasks within an APD. The method includes processing a first task within an APD. The method also includes reducing utilization of the APD by the first task to facilitate simultaneous processing of the second task, such that the utilization remains below a threshold.
    Type: Grant
    Filed: December 8, 2011
    Date of Patent: January 13, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Robert Scott Hartog, Ralph Clay Taylor, Michael Mantor, Thomas Roy Woller, Kevin McGrath, Rex McCrary, Philip J. Rogers, Mark Leather
  • Publication number: 20140362102
    Abstract: A GPU is configured to read and process data produced by a compute shader via the one or more ring buffers and pass the resulting processed data to a vertex shader as input. The GPU is further configured to allow the compute shader and vertex shader to write through a cache. Each ring buffer is configured to synchronize the compute shader and the vertex shader to prevent processed data generated by the compute shader that is written to a particular ring buffer from being overwritten before the data is accessed by the vertex shader. It is emphasized that this abstract is provided to comply with the rules requiring an abstract that will allow a searcher or other reader to quickly ascertain the subject matter of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims.
    Type: Application
    Filed: June 5, 2014
    Publication date: December 11, 2014
    Inventors: Mark Evan Cerny, David Simpson, Jason Scanlin, Michael Mantor
  • Patent number: 8854381
    Abstract: A processing unit that includes a plurality of virtual engines and a shader core. The plurality of virtual engines is configured to (i) receive, from an operating system (OS), a plurality of tasks substantially in parallel with each other and (ii) load a set of state data associated with each of the plurality of tasks. The shader core is configured to execute the plurality of tasks substantially in parallel based on the set of state data associated with each of the plurality of tasks. The processing unit may also include a scheduling module that schedules the plurality of tasks to be issued to the shader core.
    Type: Grant
    Filed: September 1, 2010
    Date of Patent: October 7, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Michael Mantor, Rex McCrary
  • Publication number: 20140292756
    Abstract: A system, method and a computer program product are provided for hybrid rendering with deferred primitive batch binning. A primitive batch is generated from a sequence of primitives. Initial bin intercepts are identified for primitives in the primitive batch. A bin for processing is identified. The bin corresponds to a region of a screen space. Pixels of the primitives intercepting the identified bin are processed. Next bin intercepts are identified while the primitives intercepting the identified bin are processed.
    Type: Application
    Filed: March 29, 2013
    Publication date: October 2, 2014
    Applicants: ATI Technologies ULC, Advanced Micro Devices, Inc.
    Inventors: Michael MANTOR, Laurent Lefebvre, Mark Fowler, Timothy Kelley, Mikko Alho, Mika Tuomi, Kallio Kia, Patrick Klas Rudolf Buss, Jari Antero Komppa, Kaj Tuomi
  • Patent number: 8826294
    Abstract: The present invention provides an efficient state management system for a complex ASIC, and applications thereof. In an embodiment, a computer-based system executes state-dependent processes. The computer-based system includes a command processor (CP) and a plurality of processing blocks. The CP receives commands in a command stream and manages a global state responsive to global context events in the command stream. The plurality of processing blocks receive the commands in the command stream and manage respective block states responsive to block context events in the command stream. Each respective processing block executes a process on data in a data stream based on the global state and the block state of the respective processing block.
    Type: Grant
    Filed: December 22, 2008
    Date of Patent: September 2, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Michael Mantor, Rex Eldon McCrary
  • Patent number: 8803891
    Abstract: Embodiments described herein provide a method of arbitrating a processing resource. The method includes receiving a command to preempt a task and preventing additional wavefronts associated with the task from being processed. The method also includes evicting currently executing wavefronts associated with the task from being processed based upon predetermined criteria.
    Type: Grant
    Filed: November 30, 2011
    Date of Patent: August 12, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Robert Scott Hartog, Ralph Clay Taylor, Michael Mantor, Sebastien Nussbaum, Rex McCrary, Mark Leather, Philip J. Rogers, Thomas R. Woller
  • Publication number: 20140157287
    Abstract: Methods, systems, and computer readable storage media embodiments allow for low overhead context switching of threads. In embodiments, applications, such as, but not limited to, iterative data-parallel applications, substantially reduce the overhead of context switching by adding a user or higher-level program configurability of a state to be saved upon preempting of a executing thread. These methods, systems, and computer readable storage media include aspects of running a group of threads on a processor, saving state information by respective threads in the group in response to a signal from a scheduler, and pre-empting running of the group after the saving of the state information.
    Type: Application
    Filed: November 30, 2012
    Publication date: June 5, 2014
    Applicant: Advanced Micro Devices, Inc
    Inventors: Lee W. HOWES, Benedict R. GASTER, Michael MANTOR
  • Publication number: 20140149677
    Abstract: Embodiments include methods, systems and computer readable media configured to execute a first kernel (e.g. compute or graphics kernel) with reduced intermediate state storage resource requirements. These include executing a first and second (e.g. prefetch) kernel on a data-parallel processor, such that the second kernel begins executing before the first kernel. The second kernel performs memory operations that are based upon at least a subset of memory operations in the first kernel.
    Type: Application
    Filed: November 26, 2012
    Publication date: May 29, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Nuwan S. JAYASENA, James Michael O'CONNOR, Michael MANTOR
  • Patent number: 8675003
    Abstract: Disclosed herein are methods, apparatuses, and systems for accessing vertex data stored in a memory, and applications thereof. Such a method includes writing vertex data of primitives into contiguous banks of a memory such that the vertex data of consecutively written primitives spans more than one row of the memory. Vertex data of two consecutively written primitives are read from the memory in a single clock cycle.
    Type: Grant
    Filed: March 24, 2010
    Date of Patent: March 18, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Michael Mantor, Michael Mang, Karl Mann
  • Publication number: 20140022263
    Abstract: The desire to use an Accelerated Processing Device (APD) for general computation has increased due to the APD's exemplary performance characteristics. However, current systems incur high overhead when dispatching work to the APD because a process cannot be efficiently identified or preempted. The occupying of the APD by a rogue process for arbitrary amounts of time can prevent the effective utilization of the available system capacity and can reduce the processing progress of the system. Embodiments described herein can overcome this deficiency by enabling the system software to pre-empt a process executing on the APD for any reason. The APD provides an interface for initiating such a pre-emption. This interface exposes an urgency of the request which determines whether the process being preempted is allowed a grace period to complete its issued work before being forced off the hardware.
    Type: Application
    Filed: July 23, 2012
    Publication date: January 23, 2014
    Inventors: Robert Scott Hartog, Ralph Clay Taylor, Michael Mantor, Kevin McGrath, Sebastien Nussbaum, Nuwan S. Jayasena, Rex Eldon McCrary, Mark Leather, Philip J. Rogers
  • Patent number: 8607247
    Abstract: Method, system, and computer program product embodiments for synchronizing workitems on one or more processors are disclosed. The embodiments include executing a barrier skip instruction by a first workitem from the group, and responsive to the executed barrier skip instruction, reconfiguring a barrier to synchronize other workitems from the group in a plurality of points in a sequence without requiring the first workitem to reach the barrier in any of the plurality of points.
    Type: Grant
    Filed: November 3, 2011
    Date of Patent: December 10, 2013
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Lee W. Howes, Benedict R. Gaster, Michael C. Houston, Michael Mantor, Mark Leather, Norman Rubin, Brian D. Emberling
  • Publication number: 20130326524
    Abstract: Disclosed methods, systems, and computer program products embodiments include synchronizing a group of workitems on a processor by storing a respective program counter associated with each of the workitems, selecting at least one first workitem from the group for execution, and executing the selected at least one first workitem on the processor. The selecting is based upon the respective stored program counter associated with the at least one first workitem.
    Type: Application
    Filed: November 8, 2012
    Publication date: December 5, 2013
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Michael C. HOUSTON, Benedict R. Gaster, Lee W. Howes, Michael Mantor, Dominik Behr
  • Publication number: 20130262812
    Abstract: A system and method is provided for improving efficiency, power, and bandwidth consumption in parallel processing. Rather than using memory polling to ensure that enough space is available in memory locations for, for example, write instructions, the techniques disclosed herein provide a system and method to automate this evaluation mechanism in environments such as data-parallel processing to efficiently check available space in memory locations before instructions such as write threads are allowed. These operations are handled efficiently in hardware, but are flexible enough to be implemented in all manner of programming models.
    Type: Application
    Filed: March 29, 2012
    Publication date: October 3, 2013
    Inventors: Laurent Lefebvre, Michael Mantor
  • Publication number: 20130262834
    Abstract: A system and method is provided for improving efficiency, power, and bandwidth consumption in parallel processing. Rather than requiring memory polling to ensure ordered execution of processes or threads, the techniques disclosed herein provide a system and method to allow any process or thread to run out of order as long as needed, but ensure ordered execution of multiple ordered instructions when needed. These operations are handled efficiently in hardware, but are flexible enough to be implemented in all manner of programming models.
    Type: Application
    Filed: March 29, 2012
    Publication date: October 3, 2013
    Inventors: Laurent Lefebvre, Michael Mantor
  • Publication number: 20130191852
    Abstract: A system, method, and computer program product are provided for improving resource utilization of multithreaded applications. Rather than requiring threads to block while waiting for data from a channel or requiring context switching to minimize blocking, the techniques disclosed herein provide an event-driven approach to launch kernels only when needed to perform operations on channel data, and then terminate in order to free resources. These operations are handled efficiently in hardware, but are flexible enough to be implemented in all manner of programming models.
    Type: Application
    Filed: September 7, 2012
    Publication date: July 25, 2013
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Lee W. Howes, Benedict R. Gaster, Michael Clair Houston, Michael Mantor
  • Patent number: 8478946
    Abstract: Embodiments for a local data share (LDS) unit are described herein. Embodiments include a co-operative set of threads to load data into shared memory so that the threads can have repeated memory access allowing higher memory bandwidth. In this way, data can be shared between related threads in a cooperative manner by providing a re-use of a locality of data from shared registers. Furthermore, embodiments of the invention allow a cooperative set of threads to fetch data in a partitioned manner so that it is only fetched once into a shared memory that can be repeatedly accessed via a separate low latency path.
    Type: Grant
    Filed: September 8, 2010
    Date of Patent: July 2, 2013
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Michael Mantor, Michael Mang, Karl Mann
  • Publication number: 20130160017
    Abstract: Embodiments describe herein provide a method of for managing task scheduling on a accelerated processing device. The method includes executing a first task within the accelerated processing device (APD), monitoring for an interruption of the execution of the first task, and switching to a second task when an interruption is detected.
    Type: Application
    Filed: December 14, 2011
    Publication date: June 20, 2013
    Inventors: Robert Scott HARTOG, Ralph Clay Taylor, Michael Mantor, Thomas Roy Woller, Kevin McGrath, Sebastien Nussbaum, Nuwan S. Jayasena, Rex McCrary, Philip J. Rogers, Mark Leather
  • Publication number: 20130155077
    Abstract: A method of determining priority within an accelerated processing device is provided. The accelerated processing device includes compute pipeline queues that are processed in accordance with predetermined criteria. The queues are selected based on priority characteristics and the selected queue is processed until a time quantum lapses or a queue having a higher priority becomes available for processing.
    Type: Application
    Filed: December 14, 2011
    Publication date: June 20, 2013
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Robert Scott HARTOG, Mark Leather, Michael Mantor, Rex McCrary, Sebastien Nussbaum, Philip J. Rogers, Ralph Clay Taylor, Thomas Woller
  • Publication number: 20130155079
    Abstract: Provided is a method for processing a command in a computing system including an accelerated processing device (APD) having a command processor. The method includes executing an interrupt routine to save one or more contexts related to a first set of instructions on a shader core in response to an instruction to preempt processing of the first set of instructions.
    Type: Application
    Filed: December 14, 2011
    Publication date: June 20, 2013
    Inventors: Robert Scott HARTOG, Nuwan JAYASENA, Mark LEATHER, Michael MANTOR, Rex McCRARY, Kevin McGRATH, Sebastien NUSSBAUM, Philip J. ROGERS, Ralph Clay TAYLOR, Thomas R. WOLLER