Patents by Inventor Mladen Wilder
Mladen Wilder has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250104181Abstract: Techniques are disclosed relating to data compression in graphics processors. In some embodiments, cache circuitry is coupled to shader processor circuitry and is configured to store graphics data that includes a compressed block of data associated with a surface and metadata for the compressed block of data. Metadata coherence circuitry may cache the metadata for the compressed block of data, receive an indication of a write command for non-compressed data associated with the surface, wherein the write command identifies the metadata and has a different address than the compressed block of data, and determine, based on the metadata and the indication, to invalidate the compressed block of data in the cache circuitry. This may maintain read/write coherence in a cache that stores both compressed and uncompressed data, in some embodiments.Type: ApplicationFiled: August 6, 2024Publication date: March 27, 2025Inventors: Karthik Ramani, Tyson J. Bergland, Leela Kishore Kothamasu, Hongzhou Zhao, Winnie W. Yeung, Mladen Wilder
-
Publication number: 20250103501Abstract: Techniques are disclosed relating to data compression in graphics processors. In some embodiments, first and second graphics processor cores include respective shader processor circuitry configured to execute graphics shader programs. Cache circuitry may be configured to store surface data, including a compressed block of surface data and metadata for the compressed block of surface data. Lock control circuitry may lock metadata for the second graphics processor core for the compressed block of surface data based on an access to the metadata by the first graphics processor core and prevent read accesses to the compressed block by the second graphics processor core until the lock on the metadata is released. This may provide consistency across graphics cores for compressed data.Type: ApplicationFiled: August 6, 2024Publication date: March 27, 2025Inventors: Mladen Wilder, Karthik Ramani, Tyson J. Bergland
-
Publication number: 20250086116Abstract: Techniques are disclosed relating to smashing atomic operations. In some embodiments, cache control circuitry caches data values in cache storage circuitry and receive multiple requests to atomically update a cached data value according to one or more arithmetic operations. The control circuitry may perform updates to a cached data value based on the multiple requests, in response to determining that the one or more arithmetic operations meet one or more criteria and store operation information that indicates a most-recent requested atomic arithmetic operation for the updated data value. The control circuitry may, in response to an event, flush, to a higher level in a memory hierarchy that includes the cache storage circuitry both: the updated data value and the operation information. This may advantageously smash atomic operations at the cache and reduce operations to the higher-level cache or memory (which may be the actual coherence point for atomic requests).Type: ApplicationFiled: November 26, 2024Publication date: March 13, 2025Inventors: Jedd O. Haberstro, Mladen Wilder
-
Publication number: 20250004948Abstract: Techniques are disclosed relating to smashing atomic operations. In some embodiments, cache control circuitry caches data values in cache storage circuitry and receive multiple requests to atomically update a cached data value according to one or more arithmetic operations. The control circuitry may perform updates to a cached data value based on the multiple requests, in response to determining that the one or more arithmetic operations meet one or more criteria and store operation information that indicates a most-recent requested atomic arithmetic operation for the updated data value. The control circuitry may, in response to an event, flush, to a higher level in a memory hierarchy that includes the cache storage circuitry both: the updated data value and the operation information. This may advantageously smash atomic operations at the cache and reduce operations to the higher-level cache or memory (which may be the actual coherence point for atomic requests).Type: ApplicationFiled: June 27, 2023Publication date: January 2, 2025Inventors: Jedd O. Haberstro, Mladen Wilder
-
Patent number: 12182026Abstract: Techniques are disclosed relating to smashing atomic operations. In some embodiments, cache control circuitry caches data values in cache storage circuitry and receive multiple requests to atomically update a cached data value according to one or more arithmetic operations. The control circuitry may perform updates to a cached data value based on the multiple requests, in response to determining that the one or more arithmetic operations meet one or more criteria and store operation information that indicates a most-recent requested atomic arithmetic operation for the updated data value. The control circuitry may, in response to an event, flush, to a higher level in a memory hierarchy that includes the cache storage circuitry both: the updated data value and the operation information. This may advantageously smash atomic operations at the cache and reduce operations to the higher-level cache or memory (which may be the actual coherence point for atomic requests).Type: GrantFiled: June 27, 2023Date of Patent: December 31, 2024Assignee: Apple Inc.Inventors: Jedd O. Haberstro, Mladen Wilder
-
Patent number: 12086644Abstract: Disclosed techniques relate to work distribution in graphics processors. In some embodiments, an apparatus includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. The circuitry may determine different distribution rules for first and second sets of graphics work and map logical slots to distributed hardware slots based on the distribution rules. In various embodiments, disclosed techniques may advantageously distribute work efficiently across distributed shader processors for graphics kicks of various sizes.Type: GrantFiled: August 11, 2021Date of Patent: September 10, 2024Assignee: Apple Inc.Inventors: Andrew M. Havlir, Steven Fishwick, David A. Gotwalt, Benjamin Bowman, Ralph C. Taylor, Melissa L. Velez, Mladen Wilder, Ali Rabbani Rankouhi, Fergus W. MacGarry
-
Publication number: 20230050061Abstract: Disclosed techniques relate to work distribution in graphics processors. In some embodiments, an apparatus includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. The circuitry may determine different distribution rules for first and second sets of graphics work and map logical slots to distributed hardware slots based on the distribution rules. In various embodiments, disclosed techniques may advantageously distribute work efficiently across distributed shader processors for graphics kicks of various sizes.Type: ApplicationFiled: August 11, 2021Publication date: February 16, 2023Inventors: Andrew M. Havlir, Steven Fishwick, David A. Gotwalt, Benjamin Bowman, Ralph C. Taylor, Melissa L. Velez, Mladen Wilder, Ali Rabbani Rankouhi, Fergus W. MacGarry
-
Patent number: 11397624Abstract: A data processing system including a data processor which is operable to execute programs to perform data processing operations and in which execution threads executing a program to perform data processing operations may be grouped together into thread groups. The data processor comprises a cross-lane permutation circuit which is operable to perform processing for cross-lane instructions which require data to be permuted (copied or moved) between the threads of a thread group. The cross-lane permutation circuit has plural data lanes between which data may be permuted (moved or copied). The number of data lanes is fewer than the number of threads in a thread group.Type: GrantFiled: January 22, 2019Date of Patent: July 26, 2022Assignee: Arm LimitedInventors: Luka Dejanovic, Mladen Wilder
-
Patent number: 10726606Abstract: When a shader program is to be executed by a graphics processor, the graphics processor is caused to execute at least two variants of the shader program and the operation of the graphics processor when executing execution threads for the different variants of the shader program is monitored. A variant of the shader program to be executed by subsequent execution threads that are to execute the shader program is then selected based on the monitoring of the operation of the shading stage when executing the execution threads for the different variants of the shader program.Type: GrantFiled: February 19, 2019Date of Patent: July 28, 2020Assignee: Arm LimitedInventors: Peter William Harris, Mladen Wilder
-
Publication number: 20200233726Abstract: A data processing system including a data processor which is operable to execute programs to perform data processing operations and in which execution threads executing a program to perform data processing operations may be grouped together into thread groups. The data processor comprises a cross-lane permutation circuit which is operable to perform processing for cross-lane instructions which require data to be permuted (copied or moved) between the threads of a thread group. The cross-lane permutation circuit has plural data lanes between which data may be permuted (moved or copied). The number data lanes is fewer than the number of threads in a thread group.Type: ApplicationFiled: January 22, 2019Publication date: July 23, 2020Applicant: Arm LimitedInventors: Luka Dejanovic, Mladen Wilder
-
Publication number: 20190259193Abstract: When a shader program is to be executed by a graphics processor, the graphics processor is caused to execute at least two variants of the shader program and the operation of the graphics processor when executing execution threads for the different variants of the shader program is monitored. A variant of the shader program to be executed by subsequent execution threads that are to execute the shader program is then selected based on the monitoring of the operation of the shading stage when executing the execution threads for the different variants of the shader program.Type: ApplicationFiled: February 19, 2019Publication date: August 22, 2019Applicant: Arm LimitedInventors: Peter William Harris, Mladen Wilder
-
Patent number: 9304926Abstract: A coherent memory system includes a plurality of level 1 cache memories 6 connected via interconnect circuitry 18 to a level 2 cache memory 8. Coherency control circuitry 10 manages coherency between lines of data. Evict messages from the level 1 cache memories to the coherency control circuitry 10 are sent via the read address channel AR. Read messages are also sent via the read address channel AR. The read address channel AR is configured such that a read message may not be reordered relative to an evict message. The coherency control circuitry 10 is configured such that a read message will not be processed ahead of an evict message. The level 1 cache memories 6 do not track in-flight evict messages. No acknowledgement of an evict message is sent from the coherency control circuitry 10 back to the level 1 cache memory 6.Type: GrantFiled: July 23, 2013Date of Patent: April 5, 2016Assignee: ARM LimitedInventors: Ian Bratt, Mladen Wilder, Ole Henrik Jahren
-
Publication number: 20150032969Abstract: A coherent memory system includes a plurality of level 1 cache memories 6 connected via interconnect circuitry 18 to a level 2 cache memory 8. Coherency control circuitry 10 manages coherency between lines of data. Evict messages from the level 1 cache memories to the coherency control circuitry 10 are sent via the read address channel AR. Read messages are also sent via the read address channel AR. The read address channel AR is configured such that a read message may not be reordered relative to an evict message. The coherency control circuitry 10 is configured such that a read message will not be processed ahead of an evict message. The level 1 cache memories 6 do not track in-flight evict messages. No acknowledgement of an evict message is sent from the coherency control circuitry 10 back to the level 1 cache memory 6.Type: ApplicationFiled: July 23, 2013Publication date: January 29, 2015Applicant: ARM LIMITEDInventors: Ian BRATT, Mladen WILDER, Ole Henrik JAHREN
-
Patent number: 8595280Abstract: A data processing apparatus and method for performing multiply-accumulate operations is provided. The data processing apparatus includes data processing circuitry responsive to control signals to perform data processing operations on at least one input data element.Type: GrantFiled: October 29, 2010Date of Patent: November 26, 2013Assignee: ARM LimitedInventors: Dominic Hugo Symes, Mladen Wilder, Guy Larri
-
Patent number: 8473819Abstract: An electronic device is described which receives data from a transmitting device via a communications channel. The electronic device comprises digital processing circuitry arranged to process the data received via the communications channel to generate output data, error detection circuitry arranged to detect errors in the output data, and monitoring circuitry arranged to monitor the quality of digital processing conducted by the digital processing circuitry and generate digital performance data indicative of the monitored quality of digital processing. The electronic device also comprises control circuitry responsive to error information comprising errors detected by the error detection circuitry and the performance data generated by the monitoring circuitry to modify the operation of one or both of the transmitting device and the electronic device.Type: GrantFiled: July 15, 2009Date of Patent: June 25, 2013Assignee: ARM LimitedInventors: Daniel Kershaw, David Michael Bull, Mladen Wilder
-
Patent number: 8443170Abstract: An apparatus and method for performing SIMD multiply-accumulate operations includes SIMD data processing circuitry responsive to control signals to perform data processing operations in parallel on multiple data elements. Instruction decoder circuitry is coupled to the SIMD data processing circuitry and is responsive to program instructions to generate the required control signals. The instruction decoder circuitry is responsive to a single instruction (referred to herein as a repeating multiply-accumulate instruction) having as input operands a first vector of input data elements, a second vector of coefficient data elements, and a scalar value indicative of a plurality of iterations required, to generate control signals to control the SIMD processing circuitry.Type: GrantFiled: September 17, 2009Date of Patent: May 14, 2013Assignee: ARM LimitedInventors: Mladen Wilder, Dominic Hugo Symes, Richard Edward Bruce
-
Patent number: 8423752Abstract: An apparatus for processing data is provided comprising processing circuitry having permutation circuitry for performing permutation operations, a register bank having a plurality of registers for storing data and control circuitry responsive to program instructions to control the processing circuitry to perform data processing operations. The control circuitry is arranged to be responsive to a control-generating instruction to generate in dependence upon a bit-mask control signals to configure permutation circuitry for performing permutation operation on an input operand. The bit-mask identifies within the input operand the first group of data elements having a first ordering and a second group of data elements having a second ordering and the permutation operation is such that it preserves one of the first ordering and the second ordering but changes the other of the first ordering and the second ordering.Type: GrantFiled: December 16, 2008Date of Patent: April 16, 2013Assignee: ARM LimitedInventors: Dominic Hugo Symes, Mladen Wilder
-
Patent number: 8255446Abstract: An apparatus and method are provided for performing rearrangement operations and arithmetic operations on data. The data processing apparatus has processing circuitry for performing Single Instruction Multiple Data (SIMD) processing operations and scalar processing operations, a register bank for storing data and control circuitry responsive to program instructions to control the processing circuitry to perform data processing operations. The control circuitry is arranged to responsive to a combined rearrangement arithmetic instruction to control the processing circuitry to perform a rearrangement operation and at least one SIMD arithmetic operation on a plurality of data elements stored in the register bank. The rearrangement operation is configurable by a size parameter derived at least in part from the register bank. The size parameter provides an indication of a number of data elements forming a rearrangement element for the purposes of the rearrangement operation.Type: GrantFiled: November 29, 2007Date of Patent: August 28, 2012Assignee: ARM LimitedInventors: Daniel Kershaw, Mladen Wilder, Dominic Hugo Symes
-
Publication number: 20110185262Abstract: An electronic device is described which receives data from a transmitting device via a communications channel. The electronic device comprises digital processing circuitry arranged to process the data received via the communications channel to generate output data, error detection circuitry arranged to detect errors in the output data, and monitoring circuitry arranged to monitor the quality of digital processing conducted by the digital processing circuitry and generate digital performance data indicative of the monitored quality of digital processing. The electronic device also comprises control circuitry responsive to error information comprising errors detected by the error detection circuitry and the performance data generated by the monitoring circuitry to modify the operation of one or both of the transmitting device and the electronic device.Type: ApplicationFiled: July 15, 2009Publication date: July 28, 2011Inventors: Daniel Kershaw, David Michael Bull, Mladen Wilder
-
Publication number: 20110106871Abstract: A data processing apparatus and method for performing multiply-accumulate operations is provided. The data processing apparatus includes data processing circuitry responsive to control signals to perform data processing operations on at least one input data element.Type: ApplicationFiled: October 29, 2010Publication date: May 5, 2011Applicant: ARM LIMITEDInventors: Dominic Hugo Symes, Mladen Wilder, Guy Larri