Patents by Inventor Michael Mantor

Michael Mantor has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Preemptive Context Switching

Publication number: 20120194524

Abstract: Methods, systems, and computer readable media embodiments are disclosed for preemptive context-switching of processes running on a accelerated processing device. Embodiments include, detecting by an accelerated processing device a memory exception, and preempting a process from running on the accelerated processing device based upon the detected exception.

Type: Application

Filed: November 4, 2011

Publication date: August 2, 2012

Applicant: Advanced Micro Devices, Inc.

Inventors: Robert Scott Hartog, Ralph Clay Taylor, Michael Mantor, Kevin McGrath, Sebastien Nussbaum, Nuwan Jayasena, Rex McCrary, Mark Leather, Philip J. Rogers, Thomas R. Woller
Method for Preempting Graphics Tasks to Accommodate Compute Tasks in an Accelerated Processing Device (APD)

Publication number: 20120194527

Abstract: Embodiments described herein provide a method of arbitrating a processing resource. The method includes receiving a command to preempt a task and preventing additional wavefronts associated with the task from being processed.

Type: Application

Filed: November 30, 2011

Publication date: August 2, 2012

Applicant: Advanced Micro Devices, Inc.

Inventors: Robert Scott Hartog, Ralph Clay Taylor, Michael Mantor, Sebastien Nussbaum, Rex McCrary, Mark Leather, Philip J. Rogers, Thomas R. Woller
Methods and Systems for Synchronous Operation of a Processing Device

Publication number: 20120198458

Abstract: Embodiments of the present invention provide a method of synchronous operation of a first processing device and a second processing device. The method includes executing a process on the first processing device, responsive to a determination that execution of the process on the first device has reached a serial-parallel boundary, passing an execution thread of the process from the first processing device to the second processing device, and executing the process on the second processing device.

Type: Application

Filed: November 30, 2011

Publication date: August 2, 2012

Applicant: Advanced Micro Devices, Inc.

Inventors: Robert Scott Hartog, Ralph Clay Taylor, Michael Mantor, Sebastien Nussbaum, Rex McCrary, Mark Leather, Nuwan S. Jayasena, Kevin McGrath, Philip j. Rogers, Thomas Woller
Method and System for Context Switching

Publication number: 20120194528

Abstract: Embodiments of the present invention provide a method of preempting a task. The method includes removing the task from the parallel processors via a scheduling mechanism. Responsive to the removing, the method also includes ceasing (i) retrieval of commands from a buffer associated with the task, (ii) dispatch of groups of work-items associated with the task, (iii) dispatch of wavefronts associated with the task, and (iiii) execution of the wavefronts. State information related to the task is saved.

Type: Application

Filed: November 30, 2011

Publication date: August 2, 2012

Applicant: Advanced Micro Devices, Inc.

Inventors: Robert Scott Hartog, Ralph Clay Taylor, Michael Mantor, Sebastien Nussbaum, Rex McCrary, Mark Leather, Philip J. Rogers, Thomas R. Woller, Kevin McGrath, Nuwan Jayasena
Mechanisms for Enabling Task Scheduling

Publication number: 20120188259

Abstract: Embodiments described herein provide a method including receiving a command to schedule a first process and selecting a command queue associated with the first process. The method also includes scheduling the first process to run on an accelerated processing device and preempting a second process running on the accelerated processing device to allow the first process to run on the accelerated processing device.

Type: Application

Filed: November 23, 2011

Publication date: July 26, 2012

Applicant: Advanced Micro Devices, Inc.

Inventors: Robert Scott Hartog, Ralph Clay Taylor, Michael Mantor, Thomas Woller, Kevin McGrath, Sebastien Nussbaum, Nuwan Jayasena, Rex McCrary, Philip Rogers, Mark Leather
Write combining cache with pipelined synchronization

Patent number: 8190826

Abstract: Systems and methods for pipelined synchronization in a write-combining cache are described herein. An embodiment to transmit data to a memory to enable pipelined synchronization of a cache includes obtaining a plurality of synchronization events for transactions with said memory, calculating one or more matches between said events and said data stored in one or more cache-lines of said cache, storing event time stamps of events associated with said matches, generating one or more priority values based on said event time stamps, concurrently transmitting said data to said memory based on said priority values.

Type: Grant

Filed: May 28, 2008

Date of Patent: May 29, 2012

Assignee: Advanced Micro Devices, Inc.

Inventors: Laurent Lefebvre, Michael Mantor, Robert Hankinson
Method and System for Synchronizing Thread Wavefront Data and Events

Publication number: 20120131596

Abstract: Systems and methods for synchronizing thread wavefronts and associated events are disclosed. According to an embodiment, a method for synchronizing one or more thread wavefronts and associated events includes inserting a first event associated with a first data output from a first thread wavefront into an event synchronizer. The event synchronizer is configured to release the first event before releasing events inserted subsequent to the first event. The method further includes releasing the first event from the event synchronizer after the first data is stored in the memory. Corresponding system and computer readable medium embodiments are also disclosed.

Type: Application

Filed: November 23, 2010

Publication date: May 24, 2012

Applicants: Advance Micro Devices, Inc., ATI Technologies ULC

Inventors: Laurent LEFEBVRE, Michael Mantor, Deborah Lynne Szasz
Data Output Transfer To Memory

Publication number: 20120110309

Abstract: Methods, systems, and computer readable media for improved transfer of processing data outputs to memory are disclosed. According to an embodiment, a method for transferring outputs of a plurality of threads concurrently executing in one or more processing units to a memory includes: forming, based upon one or more of the outputs, a combined memory export instruction comprising one or more data elements and one or more control elements; and sending the combined memory export instruction to the memory. The combined memory export instruction can be sent to memory in a single clock cycle. Another method includes: forming, based upon outputs from two or more of the threads, a memory export instruction comprising two or more data elements; embedding at least one address representative of the two or more of the outputs in a second memory instruction; and sending the memory export instruction and the second memory instruction to the memory.

Type: Application

Filed: October 29, 2010

Publication date: May 3, 2012

Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Laurent Lefebvre, Michael Mantor, Robert Hankinson
Processing Unit that Enables Asynchronous Task Dispatch

Publication number: 20110115802

Abstract: A processing unit that includes a plurality of virtual engines and a shader core. The plurality of virtual engines is configured to (i) receive, from an operating system (OS), a plurality of tasks substantially in parallel with each other and (ii) load a set of state data associated with each of the plurality of tasks. The shader core is configured to execute the plurality of tasks substantially in parallel based on the set of state data associated with each of the plurality of tasks. The processing unit may also include a scheduling module that schedules the plurality of tasks to be issued to the shader core.

Type: Application

Filed: September 1, 2010

Publication date: May 19, 2011

Inventors: Michael MANTOR, Rex McCrary
Method And System For Local Data Sharing

Publication number: 20110066813

Abstract: Embodiments for a local data share (LDS) unit are described herein. Embodiments include a co-operative set of threads to load data into shared memory so that the threads can have repeated memory access allowing higher memory bandwidth. In this way, data can be shared between related threads in a cooperative manner by providing a re-use of a locality of data from shared registers. Furthermore, embodiments of the invention allow a cooperative set of threads to fetch data in a partitioned manner so that it is only fetched once into a shared memory that can be repeatedly accessed via a separate low latency path.

Type: Application

Filed: September 8, 2010

Publication date: March 17, 2011

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael MANTOR, Michael MANG, Karl MANN
Efficient Data Access for Unified Pixel Interpolation

Publication number: 20110057942

Abstract: Disclosed herein are methods, apparatuses, and systems for accessing vertex data stored in a memory, and applications thereof. Such a method includes writing vertex data of primitives into contiguous banks of a memory such that the vertex data of consecutively written primitives spans more than one row of the memory. Vertex data of two consecutively written primitives are read from the memory in a single clock cycle.

Type: Application

Filed: March 24, 2010

Publication date: March 10, 2011

Inventors: Michael Mantor, Michael Mang, Karl Mann
Interlocked Increment Memory Allocation and Access

Publication number: 20110055511

Abstract: A method of allocating a memory to a plurality of concurrent threads is presented. The method includes dynamically determining writer threads each having at least one pending write to the memory; and dynamically allocating respective contiguous blocks in the memory for each of the writer threads. Another method of allocating a memory to a plurality of concurrent threads includes launching the plurality of threads as a plurality of wavefronts, dynamically determining a group of wavefronts each having at least one thread requiring a write to the memory, and dynamically allocating respective contiguous blocks in the memory for each wavefront from the group of wavefronts.

Type: Application

Filed: September 3, 2009

Publication date: March 3, 2011

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael MANTOR, John MCCARDLE, Marcos ZINI, Brian EMBERLING
Processing Unit with a Plurality of Shader Engines

Publication number: 20110050716

Abstract: A processor includes a first shader engine and a second shader engine. The first shader engine is configured to process pixel shaders for a first subset of pixels to be displayed on a display device. The second shader engine is configured to process pixel shaders for a second subset of pixels to be displayed on the display device. Both the first and second shader engines are also configured to process general-compute shaders and non-pixel graphics shaders. The processor may also include a level-one (L1) data cache, coupled to and positioned between the first and second shader engines.

Type: Application

Filed: January 21, 2010

Publication date: March 3, 2011

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael MANTOR, Ralph C. Taylor, Jeffrey T. Brady
APPARATUS WITH REDUNDANT CIRCUITRY AND METHOD THEREFOR

Publication number: 20100017652

Abstract: An apparatus with circuit redundancy includes a set of parallel arithmetic logic units (ALUs), a redundant parallel ALU, input data shifting logic that is coupled to the set of parallel ALUs and that is operatively coupled to the redundant parallel ALU. The input data shifting logic shifts input data for a defective ALU, in a first direction, to a neighboring ALU in the set. When the neighboring ALU is the last or end ALU in the set, the shifting logic continues to shift the input data for the end ALU that is not defective, to the redundant parallel ALU. The redundant parallel ALU then operates for the defective ALU. Output data shifting logic is coupled to an output of the parallel redundant ALU and all other ALU outputs to shift the output data in a second and opposite direction than the input shifting logic, to realign output of data for continued processing, including for storage or for further processing by other circuitry.

Type: Application

Filed: July 27, 2009

Publication date: January 21, 2010

Applicant: ATI Technologies ULC

Inventors: Michael Mantor, Ralph Clayton Taylor, Robert Scott Hartog
Write Combining Cache with Pipelined Synchronization

Publication number: 20090300288

Abstract: Systems and methods for pipelined synchronization in a write-combining cache are described herein. An embodiment to transmit data to a memory to enable pipelined synchronization of a cache includes obtaining a plurality of synchronization events for transactions with said memory, calculating one or more matches between said events and said data stored in one or more cache-lines of said cache, storing event time stamps of events associated with said matches, generating one or more priority values based on said event time stamps, concurrently transmitting said data to said memory based on said priority values.

Type: Application

Filed: May 28, 2008

Publication date: December 3, 2009

Inventors: Laurent LEFEBVRE, Michael Mantor, Robert Hankinson
Apparatus with redundant circuitry and method therefor

Patent number: 7577869

Abstract: An apparatus with circuit redundancy includes a set of parallel arithmetic logic units (ALUs), a redundant parallel ALU, input data shifting logic that is coupled to the set of parallel ALUs and that is operatively coupled to the redundant parallel ALU. The input data shifting logic shifts input data for a defective ALU, in a first direction, to a neighboring ALU in the set. When the neighboring ALU is the last or end ALU in the set, the shifting logic continues to shift the input data for the end ALU that is not defective, to the redundant parallel ALU. The redundant parallel ALU then operates for the defective ALU. Output data shifting logic is coupled to an output of the parallel redundant ALU and all other ALU outputs to shift the output data in a second and opposite direction than the input shifting logic, to realign output of data for continued processing, including for storage or for further processing by other circuitry.

Type: Grant

Filed: August 11, 2005

Date of Patent: August 18, 2009

Assignee: ATI Technologies ULC

Inventors: Michael Mantor, Ralph Clayton Taylor, Robert Scott Hartog
Efficient State Management System

Publication number: 20090172677

Abstract: The present invention provides an efficient state management system for a complex ASIC, and applications thereof. In an embodiment, a computer-based system executes state-dependent processes. The computer-based system includes a command processor (CP) and a plurality of processing blocks. The CP receives commands in a command stream and manages a global state responsive to global context events in the command stream. The plurality of processing blocks receive the commands in the command stream and manage respective block states responsive to block context events in the command stream. Each respective processing block executes a process on data in a data stream based on the global state and the block state of the respective processing block.

Type: Application

Filed: December 22, 2008

Publication date: July 2, 2009

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael MANTOR, Rex Eldon MCCRARY
Method and apparatus for multi-thread accumulation buffering in a computation engine

Patent number: 7111156

Abstract: A method and apparatus for enhancing flexibility of instruction ordering in a multi-thread processing system that performs multiply and accumulate operations is presented. A plurality of accumulation registers is provided for storing the results of an adder, wherein each of the plurality of accumulation registers corresponds to a different thread of the plurality of threads. The contents of each of the plurality of accumulation registers can be selected as an input to the adder such that the present accumulated value can be added to a subsequently calculated produce to generate a new accumulated value.

Type: Grant

Filed: April 21, 2000

Date of Patent: September 19, 2006

Assignee: ATI Technologies, Inc.

Inventors: Michael Andrew Mang, Michael Mantor
3-D rendering texture caching scheme

Publication number: 20060164429

Abstract: A 3D rendering texture caching scheme that minimizes external bandwidth requirements for texture and increases the rate at which textured pixels are available. The texture caching scheme efficiently pre-fetches data at the main memory access granularity and stores it in cache memory. The data in the main memory and texture cache memory is organized in a manner to achieve large reuse of texels with a minimum of cache memory to minimize cache misses. The texture main memory stores a two dimensional array of texels, each texel having an address and one of N identifiers. The texture cache memory has addresses partitioned into N banks, each bank containing texels transferred from the main memory that have the corresponding identifier. A cache controller determines which texels need to be transferred from the texture main memory to the texture cache memory and which texels are currently in the cache using a least most recently used algorithm.

Type: Application

Filed: January 30, 2006

Publication date: July 27, 2006

Inventors: Michael Mantor, John Carey, Ralph Taylor, Thomas Piazza, Jeffrey Potter, Angel Socarras
3-D rendering texture caching scheme

Patent number: 7050063

Abstract: A 3D rendering texture caching scheme that minimizes external bandwidth requirements for texture and increases the rate at which textured pixels are available. The texture caching scheme efficiently pre-fetches data at the main memory access granularity and stores it in cache memory. The data in the main memory and texture cache memory is organized in a manner to achieve large reuse of texels with a minimum of cache memory to minimize cache misses. The texture main memory stores a two dimensional array of texels, each texel having an address and one of N identifiers. The texture cache memory has addresses partitioned into N banks, each bank containing texels transferred from the main memory that have the corresponding identifier. A cache controller determines which texels need to be transferred from the texture main memory to the texture cache memory and which texels are currently in the cache using a least most recently used algorithm.

Type: Grant

Filed: February 11, 2000

Date of Patent: May 23, 2006

Assignee: Intel Corporation

Inventors: Michael Mantor, John Austin Carey, Ralph Clayton Taylor, Thomas A. Piazza, Jeffrey D. Potter, Angel E. Socarras

prev … 6 7 8 9 10 11 12 next