Patents by Inventor Michael Mantor

Michael Mantor has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8190826
    Abstract: Systems and methods for pipelined synchronization in a write-combining cache are described herein. An embodiment to transmit data to a memory to enable pipelined synchronization of a cache includes obtaining a plurality of synchronization events for transactions with said memory, calculating one or more matches between said events and said data stored in one or more cache-lines of said cache, storing event time stamps of events associated with said matches, generating one or more priority values based on said event time stamps, concurrently transmitting said data to said memory based on said priority values.
    Type: Grant
    Filed: May 28, 2008
    Date of Patent: May 29, 2012
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Laurent Lefebvre, Michael Mantor, Robert Hankinson
  • Publication number: 20120131596
    Abstract: Systems and methods for synchronizing thread wavefronts and associated events are disclosed. According to an embodiment, a method for synchronizing one or more thread wavefronts and associated events includes inserting a first event associated with a first data output from a first thread wavefront into an event synchronizer. The event synchronizer is configured to release the first event before releasing events inserted subsequent to the first event. The method further includes releasing the first event from the event synchronizer after the first data is stored in the memory. Corresponding system and computer readable medium embodiments are also disclosed.
    Type: Application
    Filed: November 23, 2010
    Publication date: May 24, 2012
    Applicants: Advance Micro Devices, Inc., ATI Technologies ULC
    Inventors: Laurent LEFEBVRE, Michael Mantor, Deborah Lynne Szasz
  • Publication number: 20120110309
    Abstract: Methods, systems, and computer readable media for improved transfer of processing data outputs to memory are disclosed. According to an embodiment, a method for transferring outputs of a plurality of threads concurrently executing in one or more processing units to a memory includes: forming, based upon one or more of the outputs, a combined memory export instruction comprising one or more data elements and one or more control elements; and sending the combined memory export instruction to the memory. The combined memory export instruction can be sent to memory in a single clock cycle. Another method includes: forming, based upon outputs from two or more of the threads, a memory export instruction comprising two or more data elements; embedding at least one address representative of the two or more of the outputs in a second memory instruction; and sending the memory export instruction and the second memory instruction to the memory.
    Type: Application
    Filed: October 29, 2010
    Publication date: May 3, 2012
    Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Laurent Lefebvre, Michael Mantor, Robert Hankinson
  • Publication number: 20110115802
    Abstract: A processing unit that includes a plurality of virtual engines and a shader core. The plurality of virtual engines is configured to (i) receive, from an operating system (OS), a plurality of tasks substantially in parallel with each other and (ii) load a set of state data associated with each of the plurality of tasks. The shader core is configured to execute the plurality of tasks substantially in parallel based on the set of state data associated with each of the plurality of tasks. The processing unit may also include a scheduling module that schedules the plurality of tasks to be issued to the shader core.
    Type: Application
    Filed: September 1, 2010
    Publication date: May 19, 2011
    Inventors: Michael MANTOR, Rex McCrary
  • Publication number: 20110066813
    Abstract: Embodiments for a local data share (LDS) unit are described herein. Embodiments include a co-operative set of threads to load data into shared memory so that the threads can have repeated memory access allowing higher memory bandwidth. In this way, data can be shared between related threads in a cooperative manner by providing a re-use of a locality of data from shared registers. Furthermore, embodiments of the invention allow a cooperative set of threads to fetch data in a partitioned manner so that it is only fetched once into a shared memory that can be repeatedly accessed via a separate low latency path.
    Type: Application
    Filed: September 8, 2010
    Publication date: March 17, 2011
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Michael MANTOR, Michael MANG, Karl MANN
  • Publication number: 20110057942
    Abstract: Disclosed herein are methods, apparatuses, and systems for accessing vertex data stored in a memory, and applications thereof. Such a method includes writing vertex data of primitives into contiguous banks of a memory such that the vertex data of consecutively written primitives spans more than one row of the memory. Vertex data of two consecutively written primitives are read from the memory in a single clock cycle.
    Type: Application
    Filed: March 24, 2010
    Publication date: March 10, 2011
    Inventors: Michael Mantor, Michael Mang, Karl Mann
  • Publication number: 20110055511
    Abstract: A method of allocating a memory to a plurality of concurrent threads is presented. The method includes dynamically determining writer threads each having at least one pending write to the memory; and dynamically allocating respective contiguous blocks in the memory for each of the writer threads. Another method of allocating a memory to a plurality of concurrent threads includes launching the plurality of threads as a plurality of wavefronts, dynamically determining a group of wavefronts each having at least one thread requiring a write to the memory, and dynamically allocating respective contiguous blocks in the memory for each wavefront from the group of wavefronts.
    Type: Application
    Filed: September 3, 2009
    Publication date: March 3, 2011
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Michael MANTOR, John MCCARDLE, Marcos ZINI, Brian EMBERLING
  • Publication number: 20110050716
    Abstract: A processor includes a first shader engine and a second shader engine. The first shader engine is configured to process pixel shaders for a first subset of pixels to be displayed on a display device. The second shader engine is configured to process pixel shaders for a second subset of pixels to be displayed on the display device. Both the first and second shader engines are also configured to process general-compute shaders and non-pixel graphics shaders. The processor may also include a level-one (L1) data cache, coupled to and positioned between the first and second shader engines.
    Type: Application
    Filed: January 21, 2010
    Publication date: March 3, 2011
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Michael MANTOR, Ralph C. Taylor, Jeffrey T. Brady
  • Publication number: 20100017652
    Abstract: An apparatus with circuit redundancy includes a set of parallel arithmetic logic units (ALUs), a redundant parallel ALU, input data shifting logic that is coupled to the set of parallel ALUs and that is operatively coupled to the redundant parallel ALU. The input data shifting logic shifts input data for a defective ALU, in a first direction, to a neighboring ALU in the set. When the neighboring ALU is the last or end ALU in the set, the shifting logic continues to shift the input data for the end ALU that is not defective, to the redundant parallel ALU. The redundant parallel ALU then operates for the defective ALU. Output data shifting logic is coupled to an output of the parallel redundant ALU and all other ALU outputs to shift the output data in a second and opposite direction than the input shifting logic, to realign output of data for continued processing, including for storage or for further processing by other circuitry.
    Type: Application
    Filed: July 27, 2009
    Publication date: January 21, 2010
    Applicant: ATI Technologies ULC
    Inventors: Michael Mantor, Ralph Clayton Taylor, Robert Scott Hartog
  • Publication number: 20090300288
    Abstract: Systems and methods for pipelined synchronization in a write-combining cache are described herein. An embodiment to transmit data to a memory to enable pipelined synchronization of a cache includes obtaining a plurality of synchronization events for transactions with said memory, calculating one or more matches between said events and said data stored in one or more cache-lines of said cache, storing event time stamps of events associated with said matches, generating one or more priority values based on said event time stamps, concurrently transmitting said data to said memory based on said priority values.
    Type: Application
    Filed: May 28, 2008
    Publication date: December 3, 2009
    Inventors: Laurent LEFEBVRE, Michael Mantor, Robert Hankinson
  • Patent number: 7577869
    Abstract: An apparatus with circuit redundancy includes a set of parallel arithmetic logic units (ALUs), a redundant parallel ALU, input data shifting logic that is coupled to the set of parallel ALUs and that is operatively coupled to the redundant parallel ALU. The input data shifting logic shifts input data for a defective ALU, in a first direction, to a neighboring ALU in the set. When the neighboring ALU is the last or end ALU in the set, the shifting logic continues to shift the input data for the end ALU that is not defective, to the redundant parallel ALU. The redundant parallel ALU then operates for the defective ALU. Output data shifting logic is coupled to an output of the parallel redundant ALU and all other ALU outputs to shift the output data in a second and opposite direction than the input shifting logic, to realign output of data for continued processing, including for storage or for further processing by other circuitry.
    Type: Grant
    Filed: August 11, 2005
    Date of Patent: August 18, 2009
    Assignee: ATI Technologies ULC
    Inventors: Michael Mantor, Ralph Clayton Taylor, Robert Scott Hartog
  • Publication number: 20090172677
    Abstract: The present invention provides an efficient state management system for a complex ASIC, and applications thereof. In an embodiment, a computer-based system executes state-dependent processes. The computer-based system includes a command processor (CP) and a plurality of processing blocks. The CP receives commands in a command stream and manages a global state responsive to global context events in the command stream. The plurality of processing blocks receive the commands in the command stream and manage respective block states responsive to block context events in the command stream. Each respective processing block executes a process on data in a data stream based on the global state and the block state of the respective processing block.
    Type: Application
    Filed: December 22, 2008
    Publication date: July 2, 2009
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Michael MANTOR, Rex Eldon MCCRARY
  • Patent number: 7111156
    Abstract: A method and apparatus for enhancing flexibility of instruction ordering in a multi-thread processing system that performs multiply and accumulate operations is presented. A plurality of accumulation registers is provided for storing the results of an adder, wherein each of the plurality of accumulation registers corresponds to a different thread of the plurality of threads. The contents of each of the plurality of accumulation registers can be selected as an input to the adder such that the present accumulated value can be added to a subsequently calculated produce to generate a new accumulated value.
    Type: Grant
    Filed: April 21, 2000
    Date of Patent: September 19, 2006
    Assignee: ATI Technologies, Inc.
    Inventors: Michael Andrew Mang, Michael Mantor
  • Publication number: 20060164429
    Abstract: A 3D rendering texture caching scheme that minimizes external bandwidth requirements for texture and increases the rate at which textured pixels are available. The texture caching scheme efficiently pre-fetches data at the main memory access granularity and stores it in cache memory. The data in the main memory and texture cache memory is organized in a manner to achieve large reuse of texels with a minimum of cache memory to minimize cache misses. The texture main memory stores a two dimensional array of texels, each texel having an address and one of N identifiers. The texture cache memory has addresses partitioned into N banks, each bank containing texels transferred from the main memory that have the corresponding identifier. A cache controller determines which texels need to be transferred from the texture main memory to the texture cache memory and which texels are currently in the cache using a least most recently used algorithm.
    Type: Application
    Filed: January 30, 2006
    Publication date: July 27, 2006
    Inventors: Michael Mantor, John Carey, Ralph Taylor, Thomas Piazza, Jeffrey Potter, Angel Socarras
  • Patent number: 7050063
    Abstract: A 3D rendering texture caching scheme that minimizes external bandwidth requirements for texture and increases the rate at which textured pixels are available. The texture caching scheme efficiently pre-fetches data at the main memory access granularity and stores it in cache memory. The data in the main memory and texture cache memory is organized in a manner to achieve large reuse of texels with a minimum of cache memory to minimize cache misses. The texture main memory stores a two dimensional array of texels, each texel having an address and one of N identifiers. The texture cache memory has addresses partitioned into N banks, each bank containing texels transferred from the main memory that have the corresponding identifier. A cache controller determines which texels need to be transferred from the texture main memory to the texture cache memory and which texels are currently in the cache using a least most recently used algorithm.
    Type: Grant
    Filed: February 11, 2000
    Date of Patent: May 23, 2006
    Assignee: Intel Corporation
    Inventors: Michael Mantor, John Austin Carey, Ralph Clayton Taylor, Thomas A. Piazza, Jeffrey D. Potter, Angel E. Socarras
  • Publication number: 20060053189
    Abstract: Briefly, graphics data processing logic includes a plurality of parallel arithmetic logic units (ALUs), such as floating point processors or any other suitable logic, that operate as a vector processor on at least one of pixel data and vertex data (or both) and a programmable storage element that contains data representing which of the plurality of arithmetic logic units are not to receive data for processing. The graphics data processing logic also includes parallel ALU data packing logic that is operatively coupled to the plurality of arithmetic logic processing units and to the programmable storage element to pack data only for the plurality of arithmetic logic units identified by the data in the programmable storage element as being enabled.
    Type: Application
    Filed: August 11, 2005
    Publication date: March 9, 2006
    Applicant: ATI TECHNOLOGIES INC.
    Inventor: Michael Mantor
  • Publication number: 20060053188
    Abstract: An apparatus with circuit redundancy includes a set of parallel arithmetic logic units (ALUs), a redundant parallel ALU, input data shifting logic that is coupled to the set of parallel ALUs and that is operatively coupled to the redundant parallel ALU. The input data shifting logic shifts input data for a defective ALU, in a first direction, to a neighboring ALU in the set. When the neighboring ALU is the last or end ALU in the set, the shifting logic continues to shift the input data for the end ALU that is not defective, to the redundant parallel ALU. The redundant parallel ALU then operates for the defective ALU. Output data shifting logic is coupled to an output of the parallel redundant ALU and all other ALU outputs to shift the output data in a second and opposite direction than the input shifting logic, to realign output of data for continued processing, including for storage or for further processing by other circuitry.
    Type: Application
    Filed: August 11, 2005
    Publication date: March 9, 2006
    Applicant: ATI TECHNOLOGIES INC.
    Inventors: Michael Mantor, Ralph Taylor, Robert Hartog
  • Patent number: 6967664
    Abstract: A method and apparatus for processing graphics primitives that includes a trivial discard guard band. Such a trivial discard guard band is used for comparison operations with the vertices of graphics primitives to determine whether the graphics primitives can be trivially discarded such that no further processing of the primitives is performed. The trivial discard guard band may be based on the specific dimensions of primitives such as one-half of the width of the line primitives or the radial dimension of point primitives such that the rasterization area of such primitives is taken into account when trivial discard decisions are performed.
    Type: Grant
    Filed: April 20, 2000
    Date of Patent: November 22, 2005
    Assignee: ATI International SRL
    Inventors: Ralph C. Taylor, Michael Mantor, Michael A. Mang
  • Patent number: 6731294
    Abstract: A method and apparatus for reducing latency in pipelined circuits that process dependent operations is presented. In order to reduce latency for dependent operations, a pre-accumulation register is included in an operation pipeline between a first operation unit and a second operation unit. The pre-accumulation register stores a first result produced by the first operation unit during a first operation. When the first operation unit completes a second operation to produce a second result, the first result stored in the pre-accumulation register is presented to the second operation unit along with the second result as input operands.
    Type: Grant
    Filed: April 21, 2000
    Date of Patent: May 4, 2004
    Assignee: ATI International SRL
    Inventors: Michael Andrew Mang, Michael Mantor
  • Patent number: 6728869
    Abstract: A method and apparatus for avoiding latency in a processing system that includes a memory for storing intermediate results is presented. The processing system stores results produced by an operation unit in memory, where the results may be used by subsequent dependent operations. In order to avoid the latency of the memory, the output for the operation unit may be routed directly back into the operation unit as a subsequent operand. Furthermore, one or more memory bypass registers are included such that the results produced by the operation unit during recent operations that have not yet satisfied the latency requirements of the memory are also available. A first memory bypass register may thus provide the result of an operation that completed one cycle earlier, a second memory bypass register may provide the result of an operation that completed two cycles earlier, etc.
    Type: Grant
    Filed: April 21, 2000
    Date of Patent: April 27, 2004
    Assignee: ATI International Srl
    Inventors: Michael Andrew Mang, Michael Mantor, Robert Scott Hartog