Patents by Inventor Michael Mantor

Michael Mantor has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Write combining cache with pipelined synchronization

Patent number: 8190826

Abstract: Systems and methods for pipelined synchronization in a write-combining cache are described herein. An embodiment to transmit data to a memory to enable pipelined synchronization of a cache includes obtaining a plurality of synchronization events for transactions with said memory, calculating one or more matches between said events and said data stored in one or more cache-lines of said cache, storing event time stamps of events associated with said matches, generating one or more priority values based on said event time stamps, concurrently transmitting said data to said memory based on said priority values.

Type: Grant

Filed: May 28, 2008

Date of Patent: May 29, 2012

Assignee: Advanced Micro Devices, Inc.

Inventors: Laurent Lefebvre, Michael Mantor, Robert Hankinson
Method and System for Synchronizing Thread Wavefront Data and Events

Publication number: 20120131596

Abstract: Systems and methods for synchronizing thread wavefronts and associated events are disclosed. According to an embodiment, a method for synchronizing one or more thread wavefronts and associated events includes inserting a first event associated with a first data output from a first thread wavefront into an event synchronizer. The event synchronizer is configured to release the first event before releasing events inserted subsequent to the first event. The method further includes releasing the first event from the event synchronizer after the first data is stored in the memory. Corresponding system and computer readable medium embodiments are also disclosed.

Type: Application

Filed: November 23, 2010

Publication date: May 24, 2012

Applicants: Advance Micro Devices, Inc., ATI Technologies ULC

Inventors: Laurent LEFEBVRE, Michael Mantor, Deborah Lynne Szasz
Data Output Transfer To Memory

Publication number: 20120110309

Abstract: Methods, systems, and computer readable media for improved transfer of processing data outputs to memory are disclosed. According to an embodiment, a method for transferring outputs of a plurality of threads concurrently executing in one or more processing units to a memory includes: forming, based upon one or more of the outputs, a combined memory export instruction comprising one or more data elements and one or more control elements; and sending the combined memory export instruction to the memory. The combined memory export instruction can be sent to memory in a single clock cycle. Another method includes: forming, based upon outputs from two or more of the threads, a memory export instruction comprising two or more data elements; embedding at least one address representative of the two or more of the outputs in a second memory instruction; and sending the memory export instruction and the second memory instruction to the memory.

Type: Application

Filed: October 29, 2010

Publication date: May 3, 2012

Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Laurent Lefebvre, Michael Mantor, Robert Hankinson
Processing Unit that Enables Asynchronous Task Dispatch

Publication number: 20110115802

Abstract: A processing unit that includes a plurality of virtual engines and a shader core. The plurality of virtual engines is configured to (i) receive, from an operating system (OS), a plurality of tasks substantially in parallel with each other and (ii) load a set of state data associated with each of the plurality of tasks. The shader core is configured to execute the plurality of tasks substantially in parallel based on the set of state data associated with each of the plurality of tasks. The processing unit may also include a scheduling module that schedules the plurality of tasks to be issued to the shader core.

Type: Application

Filed: September 1, 2010

Publication date: May 19, 2011

Inventors: Michael MANTOR, Rex McCrary
Method And System For Local Data Sharing

Publication number: 20110066813

Abstract: Embodiments for a local data share (LDS) unit are described herein. Embodiments include a co-operative set of threads to load data into shared memory so that the threads can have repeated memory access allowing higher memory bandwidth. In this way, data can be shared between related threads in a cooperative manner by providing a re-use of a locality of data from shared registers. Furthermore, embodiments of the invention allow a cooperative set of threads to fetch data in a partitioned manner so that it is only fetched once into a shared memory that can be repeatedly accessed via a separate low latency path.

Type: Application

Filed: September 8, 2010

Publication date: March 17, 2011

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael MANTOR, Michael MANG, Karl MANN
Efficient Data Access for Unified Pixel Interpolation

Publication number: 20110057942

Abstract: Disclosed herein are methods, apparatuses, and systems for accessing vertex data stored in a memory, and applications thereof. Such a method includes writing vertex data of primitives into contiguous banks of a memory such that the vertex data of consecutively written primitives spans more than one row of the memory. Vertex data of two consecutively written primitives are read from the memory in a single clock cycle.

Type: Application

Filed: March 24, 2010

Publication date: March 10, 2011

Inventors: Michael Mantor, Michael Mang, Karl Mann
Interlocked Increment Memory Allocation and Access

Publication number: 20110055511

Abstract: A method of allocating a memory to a plurality of concurrent threads is presented. The method includes dynamically determining writer threads each having at least one pending write to the memory; and dynamically allocating respective contiguous blocks in the memory for each of the writer threads. Another method of allocating a memory to a plurality of concurrent threads includes launching the plurality of threads as a plurality of wavefronts, dynamically determining a group of wavefronts each having at least one thread requiring a write to the memory, and dynamically allocating respective contiguous blocks in the memory for each wavefront from the group of wavefronts.

Type: Application

Filed: September 3, 2009

Publication date: March 3, 2011

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael MANTOR, John MCCARDLE, Marcos ZINI, Brian EMBERLING
Processing Unit with a Plurality of Shader Engines

Publication number: 20110050716

Abstract: A processor includes a first shader engine and a second shader engine. The first shader engine is configured to process pixel shaders for a first subset of pixels to be displayed on a display device. The second shader engine is configured to process pixel shaders for a second subset of pixels to be displayed on the display device. Both the first and second shader engines are also configured to process general-compute shaders and non-pixel graphics shaders. The processor may also include a level-one (L1) data cache, coupled to and positioned between the first and second shader engines.

Type: Application

Filed: January 21, 2010

Publication date: March 3, 2011

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael MANTOR, Ralph C. Taylor, Jeffrey T. Brady
APPARATUS WITH REDUNDANT CIRCUITRY AND METHOD THEREFOR

Publication number: 20100017652

Abstract: An apparatus with circuit redundancy includes a set of parallel arithmetic logic units (ALUs), a redundant parallel ALU, input data shifting logic that is coupled to the set of parallel ALUs and that is operatively coupled to the redundant parallel ALU. The input data shifting logic shifts input data for a defective ALU, in a first direction, to a neighboring ALU in the set. When the neighboring ALU is the last or end ALU in the set, the shifting logic continues to shift the input data for the end ALU that is not defective, to the redundant parallel ALU. The redundant parallel ALU then operates for the defective ALU. Output data shifting logic is coupled to an output of the parallel redundant ALU and all other ALU outputs to shift the output data in a second and opposite direction than the input shifting logic, to realign output of data for continued processing, including for storage or for further processing by other circuitry.

Type: Application

Filed: July 27, 2009

Publication date: January 21, 2010

Applicant: ATI Technologies ULC

Inventors: Michael Mantor, Ralph Clayton Taylor, Robert Scott Hartog
Write Combining Cache with Pipelined Synchronization

Publication number: 20090300288

Abstract: Systems and methods for pipelined synchronization in a write-combining cache are described herein. An embodiment to transmit data to a memory to enable pipelined synchronization of a cache includes obtaining a plurality of synchronization events for transactions with said memory, calculating one or more matches between said events and said data stored in one or more cache-lines of said cache, storing event time stamps of events associated with said matches, generating one or more priority values based on said event time stamps, concurrently transmitting said data to said memory based on said priority values.

Type: Application

Filed: May 28, 2008

Publication date: December 3, 2009

Inventors: Laurent LEFEBVRE, Michael Mantor, Robert Hankinson
Apparatus with redundant circuitry and method therefor

Patent number: 7577869

Abstract: An apparatus with circuit redundancy includes a set of parallel arithmetic logic units (ALUs), a redundant parallel ALU, input data shifting logic that is coupled to the set of parallel ALUs and that is operatively coupled to the redundant parallel ALU. The input data shifting logic shifts input data for a defective ALU, in a first direction, to a neighboring ALU in the set. When the neighboring ALU is the last or end ALU in the set, the shifting logic continues to shift the input data for the end ALU that is not defective, to the redundant parallel ALU. The redundant parallel ALU then operates for the defective ALU. Output data shifting logic is coupled to an output of the parallel redundant ALU and all other ALU outputs to shift the output data in a second and opposite direction than the input shifting logic, to realign output of data for continued processing, including for storage or for further processing by other circuitry.

Type: Grant

Filed: August 11, 2005

Date of Patent: August 18, 2009

Assignee: ATI Technologies ULC

Inventors: Michael Mantor, Ralph Clayton Taylor, Robert Scott Hartog
Efficient State Management System

Publication number: 20090172677

Abstract: The present invention provides an efficient state management system for a complex ASIC, and applications thereof. In an embodiment, a computer-based system executes state-dependent processes. The computer-based system includes a command processor (CP) and a plurality of processing blocks. The CP receives commands in a command stream and manages a global state responsive to global context events in the command stream. The plurality of processing blocks receive the commands in the command stream and manage respective block states responsive to block context events in the command stream. Each respective processing block executes a process on data in a data stream based on the global state and the block state of the respective processing block.

Type: Application

Filed: December 22, 2008

Publication date: July 2, 2009

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael MANTOR, Rex Eldon MCCRARY
Method and apparatus for multi-thread accumulation buffering in a computation engine

Patent number: 7111156

Abstract: A method and apparatus for enhancing flexibility of instruction ordering in a multi-thread processing system that performs multiply and accumulate operations is presented. A plurality of accumulation registers is provided for storing the results of an adder, wherein each of the plurality of accumulation registers corresponds to a different thread of the plurality of threads. The contents of each of the plurality of accumulation registers can be selected as an input to the adder such that the present accumulated value can be added to a subsequently calculated produce to generate a new accumulated value.

Type: Grant

Filed: April 21, 2000

Date of Patent: September 19, 2006

Assignee: ATI Technologies, Inc.

Inventors: Michael Andrew Mang, Michael Mantor
3-D rendering texture caching scheme

Publication number: 20060164429

Abstract: A 3D rendering texture caching scheme that minimizes external bandwidth requirements for texture and increases the rate at which textured pixels are available. The texture caching scheme efficiently pre-fetches data at the main memory access granularity and stores it in cache memory. The data in the main memory and texture cache memory is organized in a manner to achieve large reuse of texels with a minimum of cache memory to minimize cache misses. The texture main memory stores a two dimensional array of texels, each texel having an address and one of N identifiers. The texture cache memory has addresses partitioned into N banks, each bank containing texels transferred from the main memory that have the corresponding identifier. A cache controller determines which texels need to be transferred from the texture main memory to the texture cache memory and which texels are currently in the cache using a least most recently used algorithm.

Type: Application

Filed: January 30, 2006

Publication date: July 27, 2006

Inventors: Michael Mantor, John Carey, Ralph Taylor, Thomas Piazza, Jeffrey Potter, Angel Socarras
3-D rendering texture caching scheme

Patent number: 7050063

Abstract: A 3D rendering texture caching scheme that minimizes external bandwidth requirements for texture and increases the rate at which textured pixels are available. The texture caching scheme efficiently pre-fetches data at the main memory access granularity and stores it in cache memory. The data in the main memory and texture cache memory is organized in a manner to achieve large reuse of texels with a minimum of cache memory to minimize cache misses. The texture main memory stores a two dimensional array of texels, each texel having an address and one of N identifiers. The texture cache memory has addresses partitioned into N banks, each bank containing texels transferred from the main memory that have the corresponding identifier. A cache controller determines which texels need to be transferred from the texture main memory to the texture cache memory and which texels are currently in the cache using a least most recently used algorithm.

Type: Grant

Filed: February 11, 2000

Date of Patent: May 23, 2006

Assignee: Intel Corporation

Inventors: Michael Mantor, John Austin Carey, Ralph Clayton Taylor, Thomas A. Piazza, Jeffrey D. Potter, Angel E. Socarras
GRAPHICS PROCESSING LOGIC WITH VARIABLE ARITHMETIC LOGIC UNIT CONTROL AND METHOD THEREFOR

Publication number: 20060053189

Abstract: Briefly, graphics data processing logic includes a plurality of parallel arithmetic logic units (ALUs), such as floating point processors or any other suitable logic, that operate as a vector processor on at least one of pixel data and vertex data (or both) and a programmable storage element that contains data representing which of the plurality of arithmetic logic units are not to receive data for processing. The graphics data processing logic also includes parallel ALU data packing logic that is operatively coupled to the plurality of arithmetic logic processing units and to the programmable storage element to pack data only for the plurality of arithmetic logic units identified by the data in the programmable storage element as being enabled.

Type: Application

Filed: August 11, 2005

Publication date: March 9, 2006

Applicant: ATI TECHNOLOGIES INC.

Inventor: Michael Mantor
APPARATUS WITH REDUNDANT CIRCUITRY AND METHOD THEREFOR

Publication number: 20060053188

Abstract: An apparatus with circuit redundancy includes a set of parallel arithmetic logic units (ALUs), a redundant parallel ALU, input data shifting logic that is coupled to the set of parallel ALUs and that is operatively coupled to the redundant parallel ALU. The input data shifting logic shifts input data for a defective ALU, in a first direction, to a neighboring ALU in the set. When the neighboring ALU is the last or end ALU in the set, the shifting logic continues to shift the input data for the end ALU that is not defective, to the redundant parallel ALU. The redundant parallel ALU then operates for the defective ALU. Output data shifting logic is coupled to an output of the parallel redundant ALU and all other ALU outputs to shift the output data in a second and opposite direction than the input shifting logic, to realign output of data for continued processing, including for storage or for further processing by other circuitry.

Type: Application

Filed: August 11, 2005

Publication date: March 9, 2006

Applicant: ATI TECHNOLOGIES INC.

Inventors: Michael Mantor, Ralph Taylor, Robert Hartog
Method and apparatus for primitive processing in a graphics system

Patent number: 6967664

Abstract: A method and apparatus for processing graphics primitives that includes a trivial discard guard band. Such a trivial discard guard band is used for comparison operations with the vertices of graphics primitives to determine whether the graphics primitives can be trivially discarded such that no further processing of the primitives is performed. The trivial discard guard band may be based on the specific dimensions of primitives such as one-half of the width of the line primitives or the radial dimension of point primitives such that the rasterization area of such primitives is taken into account when trivial discard decisions are performed.

Type: Grant

Filed: April 20, 2000

Date of Patent: November 22, 2005

Assignee: ATI International SRL

Inventors: Ralph C. Taylor, Michael Mantor, Michael A. Mang
Vector engine with pre-accumulation buffer and method therefore

Patent number: 6731294

Abstract: A method and apparatus for reducing latency in pipelined circuits that process dependent operations is presented. In order to reduce latency for dependent operations, a pre-accumulation register is included in an operation pipeline between a first operation unit and a second operation unit. The pre-accumulation register stores a first result produced by the first operation unit during a first operation. When the first operation unit completes a second operation to produce a second result, the first result stored in the pre-accumulation register is presented to the second operation unit along with the second result as input operands.

Type: Grant

Filed: April 21, 2000

Date of Patent: May 4, 2004

Assignee: ATI International SRL

Inventors: Michael Andrew Mang, Michael Mantor
Method and apparatus for memory latency avoidance in a processing system

Patent number: 6728869

Abstract: A method and apparatus for avoiding latency in a processing system that includes a memory for storing intermediate results is presented. The processing system stores results produced by an operation unit in memory, where the results may be used by subsequent dependent operations. In order to avoid the latency of the memory, the output for the operation unit may be routed directly back into the operation unit as a subsequent operand. Furthermore, one or more memory bypass registers are included such that the results produced by the operation unit during recent operations that have not yet satisfied the latency requirements of the memory are also available. A first memory bypass register may thus provide the result of an operation that completed one cycle earlier, a second memory bypass register may provide the result of an operation that completed two cycles earlier, etc.

Type: Grant

Filed: April 21, 2000

Date of Patent: April 27, 2004

Assignee: ATI International Srl

Inventors: Michael Andrew Mang, Michael Mantor, Robert Scott Hartog

prev … 5 6 7 8 9 10 next