Patents by Inventor Mladen Wilder

Mladen Wilder has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250104181
    Abstract: Techniques are disclosed relating to data compression in graphics processors. In some embodiments, cache circuitry is coupled to shader processor circuitry and is configured to store graphics data that includes a compressed block of data associated with a surface and metadata for the compressed block of data. Metadata coherence circuitry may cache the metadata for the compressed block of data, receive an indication of a write command for non-compressed data associated with the surface, wherein the write command identifies the metadata and has a different address than the compressed block of data, and determine, based on the metadata and the indication, to invalidate the compressed block of data in the cache circuitry. This may maintain read/write coherence in a cache that stores both compressed and uncompressed data, in some embodiments.
    Type: Application
    Filed: August 6, 2024
    Publication date: March 27, 2025
    Inventors: Karthik Ramani, Tyson J. Bergland, Leela Kishore Kothamasu, Hongzhou Zhao, Winnie W. Yeung, Mladen Wilder
  • Publication number: 20250103501
    Abstract: Techniques are disclosed relating to data compression in graphics processors. In some embodiments, first and second graphics processor cores include respective shader processor circuitry configured to execute graphics shader programs. Cache circuitry may be configured to store surface data, including a compressed block of surface data and metadata for the compressed block of surface data. Lock control circuitry may lock metadata for the second graphics processor core for the compressed block of surface data based on an access to the metadata by the first graphics processor core and prevent read accesses to the compressed block by the second graphics processor core until the lock on the metadata is released. This may provide consistency across graphics cores for compressed data.
    Type: Application
    Filed: August 6, 2024
    Publication date: March 27, 2025
    Inventors: Mladen Wilder, Karthik Ramani, Tyson J. Bergland
  • Publication number: 20250086116
    Abstract: Techniques are disclosed relating to smashing atomic operations. In some embodiments, cache control circuitry caches data values in cache storage circuitry and receive multiple requests to atomically update a cached data value according to one or more arithmetic operations. The control circuitry may perform updates to a cached data value based on the multiple requests, in response to determining that the one or more arithmetic operations meet one or more criteria and store operation information that indicates a most-recent requested atomic arithmetic operation for the updated data value. The control circuitry may, in response to an event, flush, to a higher level in a memory hierarchy that includes the cache storage circuitry both: the updated data value and the operation information. This may advantageously smash atomic operations at the cache and reduce operations to the higher-level cache or memory (which may be the actual coherence point for atomic requests).
    Type: Application
    Filed: November 26, 2024
    Publication date: March 13, 2025
    Inventors: Jedd O. Haberstro, Mladen Wilder
  • Publication number: 20250004948
    Abstract: Techniques are disclosed relating to smashing atomic operations. In some embodiments, cache control circuitry caches data values in cache storage circuitry and receive multiple requests to atomically update a cached data value according to one or more arithmetic operations. The control circuitry may perform updates to a cached data value based on the multiple requests, in response to determining that the one or more arithmetic operations meet one or more criteria and store operation information that indicates a most-recent requested atomic arithmetic operation for the updated data value. The control circuitry may, in response to an event, flush, to a higher level in a memory hierarchy that includes the cache storage circuitry both: the updated data value and the operation information. This may advantageously smash atomic operations at the cache and reduce operations to the higher-level cache or memory (which may be the actual coherence point for atomic requests).
    Type: Application
    Filed: June 27, 2023
    Publication date: January 2, 2025
    Inventors: Jedd O. Haberstro, Mladen Wilder
  • Patent number: 12182026
    Abstract: Techniques are disclosed relating to smashing atomic operations. In some embodiments, cache control circuitry caches data values in cache storage circuitry and receive multiple requests to atomically update a cached data value according to one or more arithmetic operations. The control circuitry may perform updates to a cached data value based on the multiple requests, in response to determining that the one or more arithmetic operations meet one or more criteria and store operation information that indicates a most-recent requested atomic arithmetic operation for the updated data value. The control circuitry may, in response to an event, flush, to a higher level in a memory hierarchy that includes the cache storage circuitry both: the updated data value and the operation information. This may advantageously smash atomic operations at the cache and reduce operations to the higher-level cache or memory (which may be the actual coherence point for atomic requests).
    Type: Grant
    Filed: June 27, 2023
    Date of Patent: December 31, 2024
    Assignee: Apple Inc.
    Inventors: Jedd O. Haberstro, Mladen Wilder
  • Patent number: 12086644
    Abstract: Disclosed techniques relate to work distribution in graphics processors. In some embodiments, an apparatus includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. The circuitry may determine different distribution rules for first and second sets of graphics work and map logical slots to distributed hardware slots based on the distribution rules. In various embodiments, disclosed techniques may advantageously distribute work efficiently across distributed shader processors for graphics kicks of various sizes.
    Type: Grant
    Filed: August 11, 2021
    Date of Patent: September 10, 2024
    Assignee: Apple Inc.
    Inventors: Andrew M. Havlir, Steven Fishwick, David A. Gotwalt, Benjamin Bowman, Ralph C. Taylor, Melissa L. Velez, Mladen Wilder, Ali Rabbani Rankouhi, Fergus W. MacGarry
  • Publication number: 20230050061
    Abstract: Disclosed techniques relate to work distribution in graphics processors. In some embodiments, an apparatus includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. The circuitry may determine different distribution rules for first and second sets of graphics work and map logical slots to distributed hardware slots based on the distribution rules. In various embodiments, disclosed techniques may advantageously distribute work efficiently across distributed shader processors for graphics kicks of various sizes.
    Type: Application
    Filed: August 11, 2021
    Publication date: February 16, 2023
    Inventors: Andrew M. Havlir, Steven Fishwick, David A. Gotwalt, Benjamin Bowman, Ralph C. Taylor, Melissa L. Velez, Mladen Wilder, Ali Rabbani Rankouhi, Fergus W. MacGarry
  • Patent number: 11397624
    Abstract: A data processing system including a data processor which is operable to execute programs to perform data processing operations and in which execution threads executing a program to perform data processing operations may be grouped together into thread groups. The data processor comprises a cross-lane permutation circuit which is operable to perform processing for cross-lane instructions which require data to be permuted (copied or moved) between the threads of a thread group. The cross-lane permutation circuit has plural data lanes between which data may be permuted (moved or copied). The number of data lanes is fewer than the number of threads in a thread group.
    Type: Grant
    Filed: January 22, 2019
    Date of Patent: July 26, 2022
    Assignee: Arm Limited
    Inventors: Luka Dejanovic, Mladen Wilder
  • Patent number: 10726606
    Abstract: When a shader program is to be executed by a graphics processor, the graphics processor is caused to execute at least two variants of the shader program and the operation of the graphics processor when executing execution threads for the different variants of the shader program is monitored. A variant of the shader program to be executed by subsequent execution threads that are to execute the shader program is then selected based on the monitoring of the operation of the shading stage when executing the execution threads for the different variants of the shader program.
    Type: Grant
    Filed: February 19, 2019
    Date of Patent: July 28, 2020
    Assignee: Arm Limited
    Inventors: Peter William Harris, Mladen Wilder
  • Publication number: 20200233726
    Abstract: A data processing system including a data processor which is operable to execute programs to perform data processing operations and in which execution threads executing a program to perform data processing operations may be grouped together into thread groups. The data processor comprises a cross-lane permutation circuit which is operable to perform processing for cross-lane instructions which require data to be permuted (copied or moved) between the threads of a thread group. The cross-lane permutation circuit has plural data lanes between which data may be permuted (moved or copied). The number data lanes is fewer than the number of threads in a thread group.
    Type: Application
    Filed: January 22, 2019
    Publication date: July 23, 2020
    Applicant: Arm Limited
    Inventors: Luka Dejanovic, Mladen Wilder
  • Publication number: 20190259193
    Abstract: When a shader program is to be executed by a graphics processor, the graphics processor is caused to execute at least two variants of the shader program and the operation of the graphics processor when executing execution threads for the different variants of the shader program is monitored. A variant of the shader program to be executed by subsequent execution threads that are to execute the shader program is then selected based on the monitoring of the operation of the shading stage when executing the execution threads for the different variants of the shader program.
    Type: Application
    Filed: February 19, 2019
    Publication date: August 22, 2019
    Applicant: Arm Limited
    Inventors: Peter William Harris, Mladen Wilder
  • Patent number: 9304926
    Abstract: A coherent memory system includes a plurality of level 1 cache memories 6 connected via interconnect circuitry 18 to a level 2 cache memory 8. Coherency control circuitry 10 manages coherency between lines of data. Evict messages from the level 1 cache memories to the coherency control circuitry 10 are sent via the read address channel AR. Read messages are also sent via the read address channel AR. The read address channel AR is configured such that a read message may not be reordered relative to an evict message. The coherency control circuitry 10 is configured such that a read message will not be processed ahead of an evict message. The level 1 cache memories 6 do not track in-flight evict messages. No acknowledgement of an evict message is sent from the coherency control circuitry 10 back to the level 1 cache memory 6.
    Type: Grant
    Filed: July 23, 2013
    Date of Patent: April 5, 2016
    Assignee: ARM Limited
    Inventors: Ian Bratt, Mladen Wilder, Ole Henrik Jahren
  • Publication number: 20150032969
    Abstract: A coherent memory system includes a plurality of level 1 cache memories 6 connected via interconnect circuitry 18 to a level 2 cache memory 8. Coherency control circuitry 10 manages coherency between lines of data. Evict messages from the level 1 cache memories to the coherency control circuitry 10 are sent via the read address channel AR. Read messages are also sent via the read address channel AR. The read address channel AR is configured such that a read message may not be reordered relative to an evict message. The coherency control circuitry 10 is configured such that a read message will not be processed ahead of an evict message. The level 1 cache memories 6 do not track in-flight evict messages. No acknowledgement of an evict message is sent from the coherency control circuitry 10 back to the level 1 cache memory 6.
    Type: Application
    Filed: July 23, 2013
    Publication date: January 29, 2015
    Applicant: ARM LIMITED
    Inventors: Ian BRATT, Mladen WILDER, Ole Henrik JAHREN
  • Patent number: 8595280
    Abstract: A data processing apparatus and method for performing multiply-accumulate operations is provided. The data processing apparatus includes data processing circuitry responsive to control signals to perform data processing operations on at least one input data element.
    Type: Grant
    Filed: October 29, 2010
    Date of Patent: November 26, 2013
    Assignee: ARM Limited
    Inventors: Dominic Hugo Symes, Mladen Wilder, Guy Larri
  • Patent number: 8473819
    Abstract: An electronic device is described which receives data from a transmitting device via a communications channel. The electronic device comprises digital processing circuitry arranged to process the data received via the communications channel to generate output data, error detection circuitry arranged to detect errors in the output data, and monitoring circuitry arranged to monitor the quality of digital processing conducted by the digital processing circuitry and generate digital performance data indicative of the monitored quality of digital processing. The electronic device also comprises control circuitry responsive to error information comprising errors detected by the error detection circuitry and the performance data generated by the monitoring circuitry to modify the operation of one or both of the transmitting device and the electronic device.
    Type: Grant
    Filed: July 15, 2009
    Date of Patent: June 25, 2013
    Assignee: ARM Limited
    Inventors: Daniel Kershaw, David Michael Bull, Mladen Wilder
  • Patent number: 8443170
    Abstract: An apparatus and method for performing SIMD multiply-accumulate operations includes SIMD data processing circuitry responsive to control signals to perform data processing operations in parallel on multiple data elements. Instruction decoder circuitry is coupled to the SIMD data processing circuitry and is responsive to program instructions to generate the required control signals. The instruction decoder circuitry is responsive to a single instruction (referred to herein as a repeating multiply-accumulate instruction) having as input operands a first vector of input data elements, a second vector of coefficient data elements, and a scalar value indicative of a plurality of iterations required, to generate control signals to control the SIMD processing circuitry.
    Type: Grant
    Filed: September 17, 2009
    Date of Patent: May 14, 2013
    Assignee: ARM Limited
    Inventors: Mladen Wilder, Dominic Hugo Symes, Richard Edward Bruce
  • Patent number: 8423752
    Abstract: An apparatus for processing data is provided comprising processing circuitry having permutation circuitry for performing permutation operations, a register bank having a plurality of registers for storing data and control circuitry responsive to program instructions to control the processing circuitry to perform data processing operations. The control circuitry is arranged to be responsive to a control-generating instruction to generate in dependence upon a bit-mask control signals to configure permutation circuitry for performing permutation operation on an input operand. The bit-mask identifies within the input operand the first group of data elements having a first ordering and a second group of data elements having a second ordering and the permutation operation is such that it preserves one of the first ordering and the second ordering but changes the other of the first ordering and the second ordering.
    Type: Grant
    Filed: December 16, 2008
    Date of Patent: April 16, 2013
    Assignee: ARM Limited
    Inventors: Dominic Hugo Symes, Mladen Wilder
  • Patent number: 8255446
    Abstract: An apparatus and method are provided for performing rearrangement operations and arithmetic operations on data. The data processing apparatus has processing circuitry for performing Single Instruction Multiple Data (SIMD) processing operations and scalar processing operations, a register bank for storing data and control circuitry responsive to program instructions to control the processing circuitry to perform data processing operations. The control circuitry is arranged to responsive to a combined rearrangement arithmetic instruction to control the processing circuitry to perform a rearrangement operation and at least one SIMD arithmetic operation on a plurality of data elements stored in the register bank. The rearrangement operation is configurable by a size parameter derived at least in part from the register bank. The size parameter provides an indication of a number of data elements forming a rearrangement element for the purposes of the rearrangement operation.
    Type: Grant
    Filed: November 29, 2007
    Date of Patent: August 28, 2012
    Assignee: ARM Limited
    Inventors: Daniel Kershaw, Mladen Wilder, Dominic Hugo Symes
  • Publication number: 20110185262
    Abstract: An electronic device is described which receives data from a transmitting device via a communications channel. The electronic device comprises digital processing circuitry arranged to process the data received via the communications channel to generate output data, error detection circuitry arranged to detect errors in the output data, and monitoring circuitry arranged to monitor the quality of digital processing conducted by the digital processing circuitry and generate digital performance data indicative of the monitored quality of digital processing. The electronic device also comprises control circuitry responsive to error information comprising errors detected by the error detection circuitry and the performance data generated by the monitoring circuitry to modify the operation of one or both of the transmitting device and the electronic device.
    Type: Application
    Filed: July 15, 2009
    Publication date: July 28, 2011
    Inventors: Daniel Kershaw, David Michael Bull, Mladen Wilder
  • Publication number: 20110106871
    Abstract: A data processing apparatus and method for performing multiply-accumulate operations is provided. The data processing apparatus includes data processing circuitry responsive to control signals to perform data processing operations on at least one input data element.
    Type: Application
    Filed: October 29, 2010
    Publication date: May 5, 2011
    Applicant: ARM LIMITED
    Inventors: Dominic Hugo Symes, Mladen Wilder, Guy Larri