Patents by Inventor Mladen Wilder

Mladen Wilder has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Coherency Control for Compressed Graphics Data

Publication number: 20250104181

Abstract: Techniques are disclosed relating to data compression in graphics processors. In some embodiments, cache circuitry is coupled to shader processor circuitry and is configured to store graphics data that includes a compressed block of data associated with a surface and metadata for the compressed block of data. Metadata coherence circuitry may cache the metadata for the compressed block of data, receive an indication of a write command for non-compressed data associated with the surface, wherein the write command identifies the metadata and has a different address than the compressed block of data, and determine, based on the metadata and the indication, to invalidate the compressed block of data in the cache circuitry. This may maintain read/write coherence in a cache that stores both compressed and uncompressed data, in some embodiments.

Type: Application

Filed: August 6, 2024

Publication date: March 27, 2025

Inventors: Karthik Ramani, Tyson J. Bergland, Leela Kishore Kothamasu, Hongzhou Zhao, Winnie W. Yeung, Mladen Wilder
Consistency for Compressed Data Across Graphics Cores

Publication number: 20250103501

Abstract: Techniques are disclosed relating to data compression in graphics processors. In some embodiments, first and second graphics processor cores include respective shader processor circuitry configured to execute graphics shader programs. Cache circuitry may be configured to store surface data, including a compressed block of surface data and metadata for the compressed block of surface data. Lock control circuitry may lock metadata for the second graphics processor core for the compressed block of surface data based on an access to the metadata by the first graphics processor core and prevent read accesses to the compressed block by the second graphics processor core until the lock on the metadata is released. This may provide consistency across graphics cores for compressed data.

Type: Application

Filed: August 6, 2024

Publication date: March 27, 2025

Inventors: Mladen Wilder, Karthik Ramani, Tyson J. Bergland
Atomic Smashing

Publication number: 20250086116

Abstract: Techniques are disclosed relating to smashing atomic operations. In some embodiments, cache control circuitry caches data values in cache storage circuitry and receive multiple requests to atomically update a cached data value according to one or more arithmetic operations. The control circuitry may perform updates to a cached data value based on the multiple requests, in response to determining that the one or more arithmetic operations meet one or more criteria and store operation information that indicates a most-recent requested atomic arithmetic operation for the updated data value. The control circuitry may, in response to an event, flush, to a higher level in a memory hierarchy that includes the cache storage circuitry both: the updated data value and the operation information. This may advantageously smash atomic operations at the cache and reduce operations to the higher-level cache or memory (which may be the actual coherence point for atomic requests).

Type: Application

Filed: November 26, 2024

Publication date: March 13, 2025

Inventors: Jedd O. Haberstro, Mladen Wilder
Computer Processor Architecture for Coalescing of Atomic Operations

Publication number: 20250004948

Abstract: Techniques are disclosed relating to smashing atomic operations. In some embodiments, cache control circuitry caches data values in cache storage circuitry and receive multiple requests to atomically update a cached data value according to one or more arithmetic operations. The control circuitry may perform updates to a cached data value based on the multiple requests, in response to determining that the one or more arithmetic operations meet one or more criteria and store operation information that indicates a most-recent requested atomic arithmetic operation for the updated data value. The control circuitry may, in response to an event, flush, to a higher level in a memory hierarchy that includes the cache storage circuitry both: the updated data value and the operation information. This may advantageously smash atomic operations at the cache and reduce operations to the higher-level cache or memory (which may be the actual coherence point for atomic requests).

Type: Application

Filed: June 27, 2023

Publication date: January 2, 2025

Inventors: Jedd O. Haberstro, Mladen Wilder
Computer processor architecture for coalescing of atomic operations

Patent number: 12182026

Abstract: Techniques are disclosed relating to smashing atomic operations. In some embodiments, cache control circuitry caches data values in cache storage circuitry and receive multiple requests to atomically update a cached data value according to one or more arithmetic operations. The control circuitry may perform updates to a cached data value based on the multiple requests, in response to determining that the one or more arithmetic operations meet one or more criteria and store operation information that indicates a most-recent requested atomic arithmetic operation for the updated data value. The control circuitry may, in response to an event, flush, to a higher level in a memory hierarchy that includes the cache storage circuitry both: the updated data value and the operation information. This may advantageously smash atomic operations at the cache and reduce operations to the higher-level cache or memory (which may be the actual coherence point for atomic requests).

Type: Grant

Filed: June 27, 2023

Date of Patent: December 31, 2024

Assignee: Apple Inc.

Inventors: Jedd O. Haberstro, Mladen Wilder
Logical slot to hardware slot mapping for graphics processors

Patent number: 12086644

Abstract: Disclosed techniques relate to work distribution in graphics processors. In some embodiments, an apparatus includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. The circuitry may determine different distribution rules for first and second sets of graphics work and map logical slots to distributed hardware slots based on the distribution rules. In various embodiments, disclosed techniques may advantageously distribute work efficiently across distributed shader processors for graphics kicks of various sizes.

Type: Grant

Filed: August 11, 2021

Date of Patent: September 10, 2024

Assignee: Apple Inc.

Inventors: Andrew M. Havlir, Steven Fishwick, David A. Gotwalt, Benjamin Bowman, Ralph C. Taylor, Melissa L. Velez, Mladen Wilder, Ali Rabbani Rankouhi, Fergus W. MacGarry
Logical Slot to Hardware Slot Mapping for Graphics Processors

Publication number: 20230050061

Abstract: Disclosed techniques relate to work distribution in graphics processors. In some embodiments, an apparatus includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. The circuitry may determine different distribution rules for first and second sets of graphics work and map logical slots to distributed hardware slots based on the distribution rules. In various embodiments, disclosed techniques may advantageously distribute work efficiently across distributed shader processors for graphics kicks of various sizes.

Type: Application

Filed: August 11, 2021

Publication date: February 16, 2023

Inventors: Andrew M. Havlir, Steven Fishwick, David A. Gotwalt, Benjamin Bowman, Ralph C. Taylor, Melissa L. Velez, Mladen Wilder, Ali Rabbani Rankouhi, Fergus W. MacGarry
Execution of cross-lane operations in data processing systems

Patent number: 11397624

Abstract: A data processing system including a data processor which is operable to execute programs to perform data processing operations and in which execution threads executing a program to perform data processing operations may be grouped together into thread groups. The data processor comprises a cross-lane permutation circuit which is operable to perform processing for cross-lane instructions which require data to be permuted (copied or moved) between the threads of a thread group. The cross-lane permutation circuit has plural data lanes between which data may be permuted (moved or copied). The number of data lanes is fewer than the number of threads in a thread group.

Type: Grant

Filed: January 22, 2019

Date of Patent: July 26, 2022

Assignee: Arm Limited

Inventors: Luka Dejanovic, Mladen Wilder
Shader program selection in graphics processing systems

Patent number: 10726606

Abstract: When a shader program is to be executed by a graphics processor, the graphics processor is caused to execute at least two variants of the shader program and the operation of the graphics processor when executing execution threads for the different variants of the shader program is monitored. A variant of the shader program to be executed by subsequent execution threads that are to execute the shader program is then selected based on the monitoring of the operation of the shading stage when executing the execution threads for the different variants of the shader program.

Type: Grant

Filed: February 19, 2019

Date of Patent: July 28, 2020

Assignee: Arm Limited

Inventors: Peter William Harris, Mladen Wilder
DATA PROCESSING SYSTEMS

Publication number: 20200233726

Abstract: A data processing system including a data processor which is operable to execute programs to perform data processing operations and in which execution threads executing a program to perform data processing operations may be grouped together into thread groups. The data processor comprises a cross-lane permutation circuit which is operable to perform processing for cross-lane instructions which require data to be permuted (copied or moved) between the threads of a thread group. The cross-lane permutation circuit has plural data lanes between which data may be permuted (moved or copied). The number data lanes is fewer than the number of threads in a thread group.

Type: Application

Filed: January 22, 2019

Publication date: July 23, 2020

Applicant: Arm Limited

Inventors: Luka Dejanovic, Mladen Wilder
GRAPHICS PROCESSING

Publication number: 20190259193

Abstract: When a shader program is to be executed by a graphics processor, the graphics processor is caused to execute at least two variants of the shader program and the operation of the graphics processor when executing execution threads for the different variants of the shader program is monitored. A variant of the shader program to be executed by subsequent execution threads that are to execute the shader program is then selected based on the monitoring of the operation of the shading stage when executing the execution threads for the different variants of the shader program.

Type: Application

Filed: February 19, 2019

Publication date: August 22, 2019

Applicant: Arm Limited

Inventors: Peter William Harris, Mladen Wilder
Coherency control message flow

Patent number: 9304926

Abstract: A coherent memory system includes a plurality of level 1 cache memories 6 connected via interconnect circuitry 18 to a level 2 cache memory 8. Coherency control circuitry 10 manages coherency between lines of data. Evict messages from the level 1 cache memories to the coherency control circuitry 10 are sent via the read address channel AR. Read messages are also sent via the read address channel AR. The read address channel AR is configured such that a read message may not be reordered relative to an evict message. The coherency control circuitry 10 is configured such that a read message will not be processed ahead of an evict message. The level 1 cache memories 6 do not track in-flight evict messages. No acknowledgement of an evict message is sent from the coherency control circuitry 10 back to the level 1 cache memory 6.

Type: Grant

Filed: July 23, 2013

Date of Patent: April 5, 2016

Assignee: ARM Limited

Inventors: Ian Bratt, Mladen Wilder, Ole Henrik Jahren
COHERENCY CONTROL MESSAGE FLOW

Publication number: 20150032969

Abstract: A coherent memory system includes a plurality of level 1 cache memories 6 connected via interconnect circuitry 18 to a level 2 cache memory 8. Coherency control circuitry 10 manages coherency between lines of data. Evict messages from the level 1 cache memories to the coherency control circuitry 10 are sent via the read address channel AR. Read messages are also sent via the read address channel AR. The read address channel AR is configured such that a read message may not be reordered relative to an evict message. The coherency control circuitry 10 is configured such that a read message will not be processed ahead of an evict message. The level 1 cache memories 6 do not track in-flight evict messages. No acknowledgement of an evict message is sent from the coherency control circuitry 10 back to the level 1 cache memory 6.

Type: Application

Filed: July 23, 2013

Publication date: January 29, 2015

Applicant: ARM LIMITED

Inventors: Ian BRATT, Mladen WILDER, Ole Henrik JAHREN
Apparatus and method for performing multiply-accumulate operations

Patent number: 8595280

Abstract: A data processing apparatus and method for performing multiply-accumulate operations is provided. The data processing apparatus includes data processing circuitry responsive to control signals to perform data processing operations on at least one input data element.

Type: Grant

Filed: October 29, 2010

Date of Patent: November 26, 2013

Assignee: ARM Limited

Inventors: Dominic Hugo Symes, Mladen Wilder, Guy Larri
Error management

Patent number: 8473819

Abstract: An electronic device is described which receives data from a transmitting device via a communications channel. The electronic device comprises digital processing circuitry arranged to process the data received via the communications channel to generate output data, error detection circuitry arranged to detect errors in the output data, and monitoring circuitry arranged to monitor the quality of digital processing conducted by the digital processing circuitry and generate digital performance data indicative of the monitored quality of digital processing. The electronic device also comprises control circuitry responsive to error information comprising errors detected by the error detection circuitry and the performance data generated by the monitoring circuitry to modify the operation of one or both of the transmitting device and the electronic device.

Type: Grant

Filed: July 15, 2009

Date of Patent: June 25, 2013

Assignee: ARM Limited

Inventors: Daniel Kershaw, David Michael Bull, Mladen Wilder
Apparatus and method for performing SIMD multiply-accumulate operations

Patent number: 8443170

Abstract: An apparatus and method for performing SIMD multiply-accumulate operations includes SIMD data processing circuitry responsive to control signals to perform data processing operations in parallel on multiple data elements. Instruction decoder circuitry is coupled to the SIMD data processing circuitry and is responsive to program instructions to generate the required control signals. The instruction decoder circuitry is responsive to a single instruction (referred to herein as a repeating multiply-accumulate instruction) having as input operands a first vector of input data elements, a second vector of coefficient data elements, and a scalar value indicative of a plurality of iterations required, to generate control signals to control the SIMD processing circuitry.

Type: Grant

Filed: September 17, 2009

Date of Patent: May 14, 2013

Assignee: ARM Limited

Inventors: Mladen Wilder, Dominic Hugo Symes, Richard Edward Bruce
Apparatus and method for performing permutation operations in which the ordering of one of a first group and a second group of data elements is preserved and the ordering of the other group of data elements is changed

Patent number: 8423752

Abstract: An apparatus for processing data is provided comprising processing circuitry having permutation circuitry for performing permutation operations, a register bank having a plurality of registers for storing data and control circuitry responsive to program instructions to control the processing circuitry to perform data processing operations. The control circuitry is arranged to be responsive to a control-generating instruction to generate in dependence upon a bit-mask control signals to configure permutation circuitry for performing permutation operation on an input operand. The bit-mask identifies within the input operand the first group of data elements having a first ordering and a second group of data elements having a second ordering and the permutation operation is such that it preserves one of the first ordering and the second ordering but changes the other of the first ordering and the second ordering.

Type: Grant

Filed: December 16, 2008

Date of Patent: April 16, 2013

Assignee: ARM Limited

Inventors: Dominic Hugo Symes, Mladen Wilder
Apparatus and method for performing rearrangement and arithmetic operations on data

Patent number: 8255446

Abstract: An apparatus and method are provided for performing rearrangement operations and arithmetic operations on data. The data processing apparatus has processing circuitry for performing Single Instruction Multiple Data (SIMD) processing operations and scalar processing operations, a register bank for storing data and control circuitry responsive to program instructions to control the processing circuitry to perform data processing operations. The control circuitry is arranged to responsive to a combined rearrangement arithmetic instruction to control the processing circuitry to perform a rearrangement operation and at least one SIMD arithmetic operation on a plurality of data elements stored in the register bank. The rearrangement operation is configurable by a size parameter derived at least in part from the register bank. The size parameter provides an indication of a number of data elements forming a rearrangement element for the purposes of the rearrangement operation.

Type: Grant

Filed: November 29, 2007

Date of Patent: August 28, 2012

Assignee: ARM Limited

Inventors: Daniel Kershaw, Mladen Wilder, Dominic Hugo Symes
ERROR MANAGEMENT

Publication number: 20110185262

Abstract: An electronic device is described which receives data from a transmitting device via a communications channel. The electronic device comprises digital processing circuitry arranged to process the data received via the communications channel to generate output data, error detection circuitry arranged to detect errors in the output data, and monitoring circuitry arranged to monitor the quality of digital processing conducted by the digital processing circuitry and generate digital performance data indicative of the monitored quality of digital processing. The electronic device also comprises control circuitry responsive to error information comprising errors detected by the error detection circuitry and the performance data generated by the monitoring circuitry to modify the operation of one or both of the transmitting device and the electronic device.

Type: Application

Filed: July 15, 2009

Publication date: July 28, 2011

Inventors: Daniel Kershaw, David Michael Bull, Mladen Wilder
Apparatus and method for performing multiply-accumulate operations

Publication number: 20110106871

Abstract: A data processing apparatus and method for performing multiply-accumulate operations is provided. The data processing apparatus includes data processing circuitry responsive to control signals to perform data processing operations on at least one input data element.

Type: Application

Filed: October 29, 2010

Publication date: May 5, 2011

Applicant: ARM LIMITED

Inventors: Dominic Hugo Symes, Mladen Wilder, Guy Larri

1 2 next