Patents by Inventor Michael Mantor

Michael Mantor has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20120194524
    Abstract: Methods, systems, and computer readable media embodiments are disclosed for preemptive context-switching of processes running on a accelerated processing device. Embodiments include, detecting by an accelerated processing device a memory exception, and preempting a process from running on the accelerated processing device based upon the detected exception.
    Type: Application
    Filed: November 4, 2011
    Publication date: August 2, 2012
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Robert Scott Hartog, Ralph Clay Taylor, Michael Mantor, Kevin McGrath, Sebastien Nussbaum, Nuwan Jayasena, Rex McCrary, Mark Leather, Philip J. Rogers, Thomas R. Woller
  • Publication number: 20120194527
    Abstract: Embodiments described herein provide a method of arbitrating a processing resource. The method includes receiving a command to preempt a task and preventing additional wavefronts associated with the task from being processed.
    Type: Application
    Filed: November 30, 2011
    Publication date: August 2, 2012
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Robert Scott Hartog, Ralph Clay Taylor, Michael Mantor, Sebastien Nussbaum, Rex McCrary, Mark Leather, Philip J. Rogers, Thomas R. Woller
  • Publication number: 20120198458
    Abstract: Embodiments of the present invention provide a method of synchronous operation of a first processing device and a second processing device. The method includes executing a process on the first processing device, responsive to a determination that execution of the process on the first device has reached a serial-parallel boundary, passing an execution thread of the process from the first processing device to the second processing device, and executing the process on the second processing device.
    Type: Application
    Filed: November 30, 2011
    Publication date: August 2, 2012
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Robert Scott Hartog, Ralph Clay Taylor, Michael Mantor, Sebastien Nussbaum, Rex McCrary, Mark Leather, Nuwan S. Jayasena, Kevin McGrath, Philip j. Rogers, Thomas Woller
  • Publication number: 20120194528
    Abstract: Embodiments of the present invention provide a method of preempting a task. The method includes removing the task from the parallel processors via a scheduling mechanism. Responsive to the removing, the method also includes ceasing (i) retrieval of commands from a buffer associated with the task, (ii) dispatch of groups of work-items associated with the task, (iii) dispatch of wavefronts associated with the task, and (iiii) execution of the wavefronts. State information related to the task is saved.
    Type: Application
    Filed: November 30, 2011
    Publication date: August 2, 2012
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Robert Scott Hartog, Ralph Clay Taylor, Michael Mantor, Sebastien Nussbaum, Rex McCrary, Mark Leather, Philip J. Rogers, Thomas R. Woller, Kevin McGrath, Nuwan Jayasena
  • Publication number: 20120188259
    Abstract: Embodiments described herein provide a method including receiving a command to schedule a first process and selecting a command queue associated with the first process. The method also includes scheduling the first process to run on an accelerated processing device and preempting a second process running on the accelerated processing device to allow the first process to run on the accelerated processing device.
    Type: Application
    Filed: November 23, 2011
    Publication date: July 26, 2012
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Robert Scott Hartog, Ralph Clay Taylor, Michael Mantor, Thomas Woller, Kevin McGrath, Sebastien Nussbaum, Nuwan Jayasena, Rex McCrary, Philip Rogers, Mark Leather
  • Patent number: 8190826
    Abstract: Systems and methods for pipelined synchronization in a write-combining cache are described herein. An embodiment to transmit data to a memory to enable pipelined synchronization of a cache includes obtaining a plurality of synchronization events for transactions with said memory, calculating one or more matches between said events and said data stored in one or more cache-lines of said cache, storing event time stamps of events associated with said matches, generating one or more priority values based on said event time stamps, concurrently transmitting said data to said memory based on said priority values.
    Type: Grant
    Filed: May 28, 2008
    Date of Patent: May 29, 2012
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Laurent Lefebvre, Michael Mantor, Robert Hankinson
  • Publication number: 20120131596
    Abstract: Systems and methods for synchronizing thread wavefronts and associated events are disclosed. According to an embodiment, a method for synchronizing one or more thread wavefronts and associated events includes inserting a first event associated with a first data output from a first thread wavefront into an event synchronizer. The event synchronizer is configured to release the first event before releasing events inserted subsequent to the first event. The method further includes releasing the first event from the event synchronizer after the first data is stored in the memory. Corresponding system and computer readable medium embodiments are also disclosed.
    Type: Application
    Filed: November 23, 2010
    Publication date: May 24, 2012
    Applicants: Advance Micro Devices, Inc., ATI Technologies ULC
    Inventors: Laurent LEFEBVRE, Michael Mantor, Deborah Lynne Szasz
  • Publication number: 20120110309
    Abstract: Methods, systems, and computer readable media for improved transfer of processing data outputs to memory are disclosed. According to an embodiment, a method for transferring outputs of a plurality of threads concurrently executing in one or more processing units to a memory includes: forming, based upon one or more of the outputs, a combined memory export instruction comprising one or more data elements and one or more control elements; and sending the combined memory export instruction to the memory. The combined memory export instruction can be sent to memory in a single clock cycle. Another method includes: forming, based upon outputs from two or more of the threads, a memory export instruction comprising two or more data elements; embedding at least one address representative of the two or more of the outputs in a second memory instruction; and sending the memory export instruction and the second memory instruction to the memory.
    Type: Application
    Filed: October 29, 2010
    Publication date: May 3, 2012
    Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Laurent Lefebvre, Michael Mantor, Robert Hankinson
  • Publication number: 20110115802
    Abstract: A processing unit that includes a plurality of virtual engines and a shader core. The plurality of virtual engines is configured to (i) receive, from an operating system (OS), a plurality of tasks substantially in parallel with each other and (ii) load a set of state data associated with each of the plurality of tasks. The shader core is configured to execute the plurality of tasks substantially in parallel based on the set of state data associated with each of the plurality of tasks. The processing unit may also include a scheduling module that schedules the plurality of tasks to be issued to the shader core.
    Type: Application
    Filed: September 1, 2010
    Publication date: May 19, 2011
    Inventors: Michael MANTOR, Rex McCrary
  • Publication number: 20110066813
    Abstract: Embodiments for a local data share (LDS) unit are described herein. Embodiments include a co-operative set of threads to load data into shared memory so that the threads can have repeated memory access allowing higher memory bandwidth. In this way, data can be shared between related threads in a cooperative manner by providing a re-use of a locality of data from shared registers. Furthermore, embodiments of the invention allow a cooperative set of threads to fetch data in a partitioned manner so that it is only fetched once into a shared memory that can be repeatedly accessed via a separate low latency path.
    Type: Application
    Filed: September 8, 2010
    Publication date: March 17, 2011
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Michael MANTOR, Michael MANG, Karl MANN
  • Publication number: 20110057942
    Abstract: Disclosed herein are methods, apparatuses, and systems for accessing vertex data stored in a memory, and applications thereof. Such a method includes writing vertex data of primitives into contiguous banks of a memory such that the vertex data of consecutively written primitives spans more than one row of the memory. Vertex data of two consecutively written primitives are read from the memory in a single clock cycle.
    Type: Application
    Filed: March 24, 2010
    Publication date: March 10, 2011
    Inventors: Michael Mantor, Michael Mang, Karl Mann
  • Publication number: 20110055511
    Abstract: A method of allocating a memory to a plurality of concurrent threads is presented. The method includes dynamically determining writer threads each having at least one pending write to the memory; and dynamically allocating respective contiguous blocks in the memory for each of the writer threads. Another method of allocating a memory to a plurality of concurrent threads includes launching the plurality of threads as a plurality of wavefronts, dynamically determining a group of wavefronts each having at least one thread requiring a write to the memory, and dynamically allocating respective contiguous blocks in the memory for each wavefront from the group of wavefronts.
    Type: Application
    Filed: September 3, 2009
    Publication date: March 3, 2011
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Michael MANTOR, John MCCARDLE, Marcos ZINI, Brian EMBERLING
  • Publication number: 20110050716
    Abstract: A processor includes a first shader engine and a second shader engine. The first shader engine is configured to process pixel shaders for a first subset of pixels to be displayed on a display device. The second shader engine is configured to process pixel shaders for a second subset of pixels to be displayed on the display device. Both the first and second shader engines are also configured to process general-compute shaders and non-pixel graphics shaders. The processor may also include a level-one (L1) data cache, coupled to and positioned between the first and second shader engines.
    Type: Application
    Filed: January 21, 2010
    Publication date: March 3, 2011
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Michael MANTOR, Ralph C. Taylor, Jeffrey T. Brady
  • Publication number: 20100017652
    Abstract: An apparatus with circuit redundancy includes a set of parallel arithmetic logic units (ALUs), a redundant parallel ALU, input data shifting logic that is coupled to the set of parallel ALUs and that is operatively coupled to the redundant parallel ALU. The input data shifting logic shifts input data for a defective ALU, in a first direction, to a neighboring ALU in the set. When the neighboring ALU is the last or end ALU in the set, the shifting logic continues to shift the input data for the end ALU that is not defective, to the redundant parallel ALU. The redundant parallel ALU then operates for the defective ALU. Output data shifting logic is coupled to an output of the parallel redundant ALU and all other ALU outputs to shift the output data in a second and opposite direction than the input shifting logic, to realign output of data for continued processing, including for storage or for further processing by other circuitry.
    Type: Application
    Filed: July 27, 2009
    Publication date: January 21, 2010
    Applicant: ATI Technologies ULC
    Inventors: Michael Mantor, Ralph Clayton Taylor, Robert Scott Hartog
  • Publication number: 20090300288
    Abstract: Systems and methods for pipelined synchronization in a write-combining cache are described herein. An embodiment to transmit data to a memory to enable pipelined synchronization of a cache includes obtaining a plurality of synchronization events for transactions with said memory, calculating one or more matches between said events and said data stored in one or more cache-lines of said cache, storing event time stamps of events associated with said matches, generating one or more priority values based on said event time stamps, concurrently transmitting said data to said memory based on said priority values.
    Type: Application
    Filed: May 28, 2008
    Publication date: December 3, 2009
    Inventors: Laurent LEFEBVRE, Michael Mantor, Robert Hankinson
  • Patent number: 7577869
    Abstract: An apparatus with circuit redundancy includes a set of parallel arithmetic logic units (ALUs), a redundant parallel ALU, input data shifting logic that is coupled to the set of parallel ALUs and that is operatively coupled to the redundant parallel ALU. The input data shifting logic shifts input data for a defective ALU, in a first direction, to a neighboring ALU in the set. When the neighboring ALU is the last or end ALU in the set, the shifting logic continues to shift the input data for the end ALU that is not defective, to the redundant parallel ALU. The redundant parallel ALU then operates for the defective ALU. Output data shifting logic is coupled to an output of the parallel redundant ALU and all other ALU outputs to shift the output data in a second and opposite direction than the input shifting logic, to realign output of data for continued processing, including for storage or for further processing by other circuitry.
    Type: Grant
    Filed: August 11, 2005
    Date of Patent: August 18, 2009
    Assignee: ATI Technologies ULC
    Inventors: Michael Mantor, Ralph Clayton Taylor, Robert Scott Hartog
  • Publication number: 20090172677
    Abstract: The present invention provides an efficient state management system for a complex ASIC, and applications thereof. In an embodiment, a computer-based system executes state-dependent processes. The computer-based system includes a command processor (CP) and a plurality of processing blocks. The CP receives commands in a command stream and manages a global state responsive to global context events in the command stream. The plurality of processing blocks receive the commands in the command stream and manage respective block states responsive to block context events in the command stream. Each respective processing block executes a process on data in a data stream based on the global state and the block state of the respective processing block.
    Type: Application
    Filed: December 22, 2008
    Publication date: July 2, 2009
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Michael MANTOR, Rex Eldon MCCRARY
  • Patent number: 7111156
    Abstract: A method and apparatus for enhancing flexibility of instruction ordering in a multi-thread processing system that performs multiply and accumulate operations is presented. A plurality of accumulation registers is provided for storing the results of an adder, wherein each of the plurality of accumulation registers corresponds to a different thread of the plurality of threads. The contents of each of the plurality of accumulation registers can be selected as an input to the adder such that the present accumulated value can be added to a subsequently calculated produce to generate a new accumulated value.
    Type: Grant
    Filed: April 21, 2000
    Date of Patent: September 19, 2006
    Assignee: ATI Technologies, Inc.
    Inventors: Michael Andrew Mang, Michael Mantor
  • Publication number: 20060164429
    Abstract: A 3D rendering texture caching scheme that minimizes external bandwidth requirements for texture and increases the rate at which textured pixels are available. The texture caching scheme efficiently pre-fetches data at the main memory access granularity and stores it in cache memory. The data in the main memory and texture cache memory is organized in a manner to achieve large reuse of texels with a minimum of cache memory to minimize cache misses. The texture main memory stores a two dimensional array of texels, each texel having an address and one of N identifiers. The texture cache memory has addresses partitioned into N banks, each bank containing texels transferred from the main memory that have the corresponding identifier. A cache controller determines which texels need to be transferred from the texture main memory to the texture cache memory and which texels are currently in the cache using a least most recently used algorithm.
    Type: Application
    Filed: January 30, 2006
    Publication date: July 27, 2006
    Inventors: Michael Mantor, John Carey, Ralph Taylor, Thomas Piazza, Jeffrey Potter, Angel Socarras
  • Patent number: 7050063
    Abstract: A 3D rendering texture caching scheme that minimizes external bandwidth requirements for texture and increases the rate at which textured pixels are available. The texture caching scheme efficiently pre-fetches data at the main memory access granularity and stores it in cache memory. The data in the main memory and texture cache memory is organized in a manner to achieve large reuse of texels with a minimum of cache memory to minimize cache misses. The texture main memory stores a two dimensional array of texels, each texel having an address and one of N identifiers. The texture cache memory has addresses partitioned into N banks, each bank containing texels transferred from the main memory that have the corresponding identifier. A cache controller determines which texels need to be transferred from the texture main memory to the texture cache memory and which texels are currently in the cache using a least most recently used algorithm.
    Type: Grant
    Filed: February 11, 2000
    Date of Patent: May 23, 2006
    Assignee: Intel Corporation
    Inventors: Michael Mantor, John Austin Carey, Ralph Clayton Taylor, Thomas A. Piazza, Jeffrey D. Potter, Angel E. Socarras