Patents by Inventor Michael J. Mantor
Michael J. Mantor has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 9311102Abstract: Systems and methods to improve performance in a graphics processing unit are described herein. Embodiments achieve power saving in a graphics processing unit by dynamically activating/deactivating individual SIMDs in a shader complex that comprises multiple SIMD units. On-the-fly dynamic disabling and enabling of individual SIMDs provides flexibility in achieving a required performance and power level for a given processing application. Embodiments of the invention also achieve dynamic medium grain clock gating of SIMDs in a shader complex. Embodiments reduce switching power by shutting down clock trees to unused logic by providing a clock on demand mechanism. In this way, embodiments enhance clock gating to save more switching power for the duration of time when SIMDs are idle (or assigned no work). Embodiments can also save leakage power by power gating SIMDs for a duration when SIMDs are idle for an extended period of time.Type: GrantFiled: July 12, 2011Date of Patent: April 12, 2016Assignee: Advanced Micro Devices, Inc.Inventors: Tushar K. Shah, Michael J. Mantor, Brian Emberling
-
Publication number: 20150332427Abstract: Methods, systems and non-transitory computer readable media are described. A system includes a shader pipe array, a redundant shader pipe array, a sequencer and a redundant shader switch. The shader pipe array includes multiple shader pipes, each of which perform rendering calculations on data provided thereto. The redundant shader pipe array also performs rendering calculations on data provided thereto. The sequencer identifies at least one defective shader pipe in the shader pipe array, and, in response, generates a signal. The redundant shader switch receives the generated signal, and, in response, transfers the data destined for each shader pipe identified as being defective independently to the redundant shader pipe array.Type: ApplicationFiled: July 24, 2015Publication date: November 19, 2015Applicant: ADVANCED MICRO DEVICES, INC.Inventors: Michael J. Mantor, Jeffrey T. Brady, Angel E. Socarras
-
Patent number: 9093040Abstract: A method and apparatus for shader data repair utilizing a Redundant Shader Switch (RSS). The RSS consists of an input and output section whereby when a defective shader pipe is detected, the RSS multiplexes shader pipe data destined to the defective shader pipe to a redundant shader pipe array for processing. Once processed, the shader pipe data is multiplexed back to the RSS where the processed shader pipe data is directed to the corresponding output column of the RSS. The RSS contains delay pipes used to re-align and synchronize the repaired shader pipe data with output export data.Type: GrantFiled: June 1, 2009Date of Patent: July 28, 2015Assignee: Advanced Micro Devices, Inc.Inventors: Michael J. Mantor, Jeffrey T. Brady, Angel E. Socarras
-
Patent number: 8558836Abstract: A Scalable and Unified Compute System performs scalable, repairable general purpose and graphics shading operations, memory load/store operations and texture filtering. A Scalable and Unified Compute. Unit Module comprises a shader pipe array, a texture mapping unit, and a level one texture cache system. It accepts ALU instructions, input/output instructions, and texture or memory requests for a specified set of pixels, vertices, primitives, surfaces, or general compute work items from a shader program and performs associated operations to compute the programmed output data. The texture mapping unit accepts source data addresses and instruction constants in order to fetch, format, and perform instructed filtering interpolations to generate formatted results based on the specific corresponding data stored in a level one texture cache system. The texture mapping unit consists of an address generating system, a pre-formatter module, interpolator module, accumulator module and a format module.Type: GrantFiled: June 1, 2009Date of Patent: October 15, 2013Assignee: Advanced Micro Devices, Inc.Inventors: Michael J. Mantor, Jeffrey T. Brady, Mark C. Fowler, Marcos P. Zini
-
Patent number: 8473721Abstract: Disclosed herein is a processing unit configured to process video data, and applications thereof. In an embodiment, the processing unit includes a buffer and an execution unit. The buffer is configured to store a data word, wherein the data word comprises a plurality of bytes of video data. The execution unit is configured to execute a single instruction to (i) shift bytes of video data contained in the data word to align a desired byte of video data and (ii) process the desired byte of the video data to provide processed video data.Type: GrantFiled: April 16, 2010Date of Patent: June 25, 2013Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Michael J. Mantor, Jeffrey T. Brady, Christopher L. Spencer, Daniel W. Wong, Andrew E. Gruber
-
Patent number: 8468191Abstract: Systems and methods for multi-precision computation are disclosed. One embodiment of the present invention includes a plurality of multiply-add units (MADDs) configured to perform one or more single precision operations and an arrangement generator to generate one or more mantissa arrangements using a plurality of double precision numbers. Each MADD is configured to receive and load said mantissa arrangements from the arrangement generator. The MADDs compute a result of a multi-precision computation using the mantissa arrangements. In an embodiment, the MADDs are configured to simultaneously perform operations that include, single precision operations, double-precision additions and double-precision multiply and additions.Type: GrantFiled: June 10, 2010Date of Patent: June 18, 2013Assignee: Advanced Micro Devices, Inc.Inventors: Michael J. Mantor, Jeffrey T. Brady, Daniel B. Clifton, Christopher Spencer
-
Patent number: 8316252Abstract: A method, computer program product, and system are provided for controlling a clock distribution network. For example, an embodiment of the method can include programming a predetermined delay time into a plurality of processing elements and controlling an activation and de-activation of these processing elements in a sequence based on the predetermined delay time. The processing elements are located in a system incorporating the clock distribution network, where the predetermined delay time can be programmed in a control register of a clock gate control circuit residing in the processing element. Further, when controlling the activation and de-activation of the processing elements, this activity can be controlled with a state machine based on the system's mode of operation.Type: GrantFiled: August 15, 2008Date of Patent: November 20, 2012Assignee: Advanced Micro Devices, Inc.Inventors: Michael J. Mantor, Tushar K. Shah, Donald P. Lee
-
Patent number: 8195882Abstract: A shader pipe texture filter utilizes a level one cache system as a primary method of storage but with the ability to have the level one cache system read and write to a level two cache system when necessary. The level one cache system communicates with the level two cache system via a wide channel memory bus. In addition, the level one cache system can be configured to support dual shader pipe texture filters while maintaining access to the level two cache system. A method utilizing a level one cache system as a primary method of storage with the ability to have the level one cache system read and write a level two cache system when necessary is also presented. In addition, level one cache systems can allocate a defined area of memory to be sharable amongst other resources.Type: GrantFiled: June 1, 2009Date of Patent: June 5, 2012Assignee: Advanced Micro Devices, Inc.Inventors: Anthony P. DeLaurier, Mark Leather, Robert S. Hartog, Michael J. Mantor, Mark C. Fowler, Marcos P. Zini
-
Publication number: 20120013627Abstract: Systems and methods to improve performance in a graphics processing unit are described herein. Embodiments achieve power saving in a graphics processing unit by dynamically activating/deactivating individual SIMDs in a shader complex that comprises multiple SIMD units. On-the-fly dynamic disabling and enabling of individual SIMDs provides flexibility in achieving a required performance and power level for a given processing application. Embodiments of the invention also achieve dynamic medium grain clock gating of SIMDs in a shader complex. Embodiments reduce switching power by shutting down clock trees to unused logic by providing a clock on demand mechanism. In this way, embodiments enhance clock gating to save more switching power for the duration of time when SIMDs are idle (or assigned no work). Embodiments can also save leakage power by power gating SIMDs for a duration when SIMDs are idle for an extended period of time.Type: ApplicationFiled: July 12, 2011Publication date: January 19, 2012Applicant: Advanced Micro Devices, Inc.Inventors: Tushar K. Shah, Michael J. Mantor, Brian Emberling
-
Publication number: 20110057940Abstract: Disclosed herein is a processing unit configured to process video data, and applications thereof. In an embodiment, the processing unit includes a buffer and an execution unit. The buffer is configured to store a data word, wherein the data word comprises a plurality of bytes of video data. The execution unit is configured to execute a single instruction to (i) shift bytes of video data contained in the data word to align a desired byte of video data and (ii) process the desired byte of the video data to provide processed video data.Type: ApplicationFiled: April 16, 2010Publication date: March 10, 2011Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Michael J. MANTOR, Jeffrey T. Brady, Christopher L. Spencer, Daniel W. Wong, Andrew E. Gruber
-
Publication number: 20110055308Abstract: Systems and methods for multi-precision computation are disclosed. One embodiment of the present invention includes a plurality of multiply-add units (MADDs) configured to perform one or more single precision operations and an arrangement generator to generate one or more mantissa arrangements using a plurality of double precision numbers. Each MADD is configured to receive and load said mantissa arrangements from the arrangement generator. The MADDs compute a result of a multi-precision computation using the mantissa arrangements. In an embodiment, the MADDs are configured to simultaneously perform operations that include, single precision operations, double-precision additions and double-precision multiply and additions.Type: ApplicationFiled: June 10, 2010Publication date: March 3, 2011Applicant: Advanced Micro Devices, Inc.Inventors: Michael J. Mantor, Jeffrey T. Brady, Daniel B. Clifton, Christopher Spencer
-
Publication number: 20100146211Abstract: A shader pipe texture filter utilizes a level one cache system as a primary method of storage but with the ability to have the level one cache system read and write to a level two cache system when necessary. The level one cache system communicates with the level two cache system via a wide channel memory bus. In addition, the level one cache system can be configured to support dual shader pipe texture filters while maintaining access to the level two cache system. A method utilizing a level one cache system as a primary method of storage with the ability to have the level one cache system read and write a level two cache system when necessary is also presented. In addition, level one cache systems can allocate a defined area of memory to be sharable amongst other resources.Type: ApplicationFiled: June 1, 2009Publication date: June 10, 2010Applicant: Advanced Micro Devices, Inc.Inventors: Anthony P. DeLaurier, Mark Leather, Robert S. Hartog, Michael J. Mantor, Mark C. Fowler, Marcos P. Zini
-
Publication number: 20090315909Abstract: Each row of a row based shader engine comprises a shader pipe array, a texture filter, and a level one texture cache system. The shader pipe array accepts texture requests for a specified pixel from a resource and performs associated rendering calculations, outputting texel data. The texture mapping unit receives texel data from a level one cache system and through formatting and bilinear filtering interpolations, generates a formatted bilinear result based on a specific pixel's corresponding four texels. Utilizing multiple rows of a row based shader engine within the shader engine allows for the parallel processing of multiple simultaneous resource requests. A method for texture filtering utilizing a row based shader engine is also presented.Type: ApplicationFiled: June 1, 2009Publication date: December 24, 2009Applicant: Advanced Micro Devices, Inc.Inventors: Anthony P. DeLaurier, Mark Leather, Robert S. Hartog, Michael J. Mantor, Jeffrey T. Brady, Mark C. Fowler, Marcos P. Zini
-
Publication number: 20090309896Abstract: Apparatus and systems utilizing multiple shader engines where each shader engine comprises multiple rows of shader engine filters combined with level one and level two cache systems. Each unified shader engine filter comprises a shader pipe array, and a texture mapping unit with access to a level one cache system and a level two cache. The shader pipe array accepts texture requests for a specified pixel from a resource and performs associated rendering calculations, outputting texel data. The texture mapping unit retrieves texel data stored in a level one cache system, with the ability to read and write to and from a level two cache system, and through formatting and bilinear filtering interpolations generates a formatted bilinear result based on the specific pixel's neighboring texels. Utilizing multiple rows of shader engine filters within a shader engine allows for the parallel processing of multiple simultaneous resource requests.Type: ApplicationFiled: June 1, 2009Publication date: December 17, 2009Applicant: Advanced Micro Devices, Inc.Inventors: Anthony P. DeLaurier, Mark Leather, Robert S. Hartog, Michael J. Mantor, Mark C. Fowler, Jeffrey T. Brady, Marcos P. Zini
-
Publication number: 20090300388Abstract: A method, computer program product, and system are provided for controlling a clock distribution network. For example, an embodiment of the method can include programming a predetermined delay time into a plurality of processing elements and controlling an activation and de-activation of these processing elements in a sequence based on the predetermined delay time. The processing elements are located in a system incorporating the clock distribution network, where the predetermined delay time can be programmed in a control register of a clock gate control circuit residing in the processing element. Further, when controlling the activation and de-activation of the processing elements, this activity can be controlled with a state machine based on the system's mode of operation.Type: ApplicationFiled: August 15, 2008Publication date: December 3, 2009Applicant: Advanced Micro Devices Inc.Inventors: Michael J. MANTOR, Tushar K. SHAH, Donald P. LEE
-
Publication number: 20090295821Abstract: A Scalable and Unified Compute System performs scalable, repairable general purpose and graphics shading operations, memory load/store operations and texture filtering. A Scalable and Unified Compute Unit Module comprises a shader pipe array, a texture mapping unit, and a level one texture cache system. The Scalable and Unified Compute Unit Module accepts ALU instructions, input/output instructions, and texture or memory requests for a specified set of pixels, vertices, primitives, surfaces, or general compute work items from a shader program and performs associated operations to compute the programmed output data. The texture mapping unit accepts source data addresses and instruction constants in order to fetch, format, and perform instructed filtering interpolations to generate formatted results based on the specific corresponding data stored in a level one texture cache system.Type: ApplicationFiled: June 1, 2009Publication date: December 3, 2009Applicant: Advanced Micro Devices, Inc.Inventors: Michael J. Mantor, Jeffrey T. Brady, Mark C. Fowler, Marcos P. Zini
-
Publication number: 20090300293Abstract: Methods and systems for dynamically partitioning a cache and maintaining cache coherency are provided. In an embodiment, a system for processing memory requests includes a cache and a cache controller configured to compare a memory address and a type of a received memory request to a memory address and a type, respectively, corresponding to a cache line of the cache to determine whether the memory request hits on the cache line. In another embodiment, a method for processing fetch memory requests includes receiving a memory request and determining if the memory request hits on a cache line of a cache by determining if a memory address and a type of the memory request match a memory address and a type, respectively, corresponding to a cache line of the cache.Type: ApplicationFiled: July 1, 2008Publication date: December 3, 2009Applicant: Advanced Micro Devices, Inc.Inventors: Michael J. MANTOR, Brian A. Buchner, John P. McCardle, II
-
Publication number: 20090295820Abstract: A method and apparatus for shader data repair utilizing a Redundant Shader Switch (RSS). The RSS consists of an input and output section whereby when a defective shader pipe is detected, the RSS multiplexes shader pipe data destined to the defective shader pipe to a redundant shader pipe array for processing. Once processed, the shader pipe data is multiplexed back to the RSS where the processed shader pipe data is directed to the corresponding output column of the RSS. The RSS contains delay pipes used to re-align and synchronize the repaired shader pipe data with output export data.Type: ApplicationFiled: June 1, 2009Publication date: December 3, 2009Applicant: Advanced Micro Devices, Inc.Inventors: Michael J. Mantor, Jeffrey T. Brady, Angel E. Socarras
-
Publication number: 20090300621Abstract: A graphics processing unit is disclosed, the graphics processing unit having a processor having one or more SIMD processing units, and a local data share corresponding to one of the one or more SIMD processing units, the local data share comprising one or more low latency accessible memory regions for each group of threads assigned to one or more execution wavefronts, and a global data share comprising one or more low latency memory regions for each group of threads.Type: ApplicationFiled: June 1, 2009Publication date: December 3, 2009Applicant: Advanced Micro Devices, Inc.Inventors: Michael J. MANTOR, Brian Emberling
-
Patent number: 6943800Abstract: In a graphics processing circuit, up to N sets of state data are stored in a buffer such that a total length of the N sets of state data does not exceed the total length of the buffer. When a length of additional state data would exceed a length of available space in the buffer, storage of the additional set of state data in the buffer is delayed until at least M of the N sets of state data are no longer being used to process graphics primitives, wherein M is less than or equal to N. The buffer is preferably implemented as a ring buffer, thereby minimizing the impact of state data updates. To further prevent corruption of state data, additional sets of state data are prohibited from being added to the buffer if a maximum number of allowed states is already stored in the buffer.Type: GrantFiled: August 13, 2001Date of Patent: September 13, 2005Assignee: ATI Technologies, Inc.Inventors: Ralph C. Taylor, Michael J. Mantor