Patents by Inventor Terence M. Potter

Terence M. Potter has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10445852
    Abstract: Techniques are disclosed relating to a hardware-supported flexible data structure for graphics processing. In some embodiments, dimensions of the data structure are configurable in an X direction, a Y direction, a number of samples per pixel, and an amount of data per sample. In some embodiments, these attributes are configurable using hardware registers. In some embodiments, the data structure is persistent across a tile being processed such that local memory context is accessible to both rendering threads of a render pass and mid-render compute threads.
    Type: Grant
    Filed: December 22, 2016
    Date of Patent: October 15, 2019
    Assignee: Apple Inc.
    Inventors: Terence M. Potter, Robert Kenney, Aaftab A. Munshi, Justin A. Hensley, Richard W. Schreyer
  • Patent number: 10387119
    Abstract: Techniques are disclosed relating to performing arithmetic operations to generate values for different related threads. In some embodiments, the threads are graphics threads and the values are operand locations. In some embodiments, an apparatus includes circuitry configured to generate results for multiple threads by performing a plurality of arithmetic operations indicated by an instruction. In some embodiments, the instruction specifies: an input value that is common to the multiple threads and, for at least one of the multiple threads, a type value that indicates whether to generate a result for the thread by performing an arithmetic operation based on a first input that is a result of an arithmetic operation from another thread of the multiple threads or to generate a result for the thread using the input value that is common to the multiple threads.
    Type: Grant
    Filed: September 28, 2018
    Date of Patent: August 20, 2019
    Assignee: Apple Inc.
    Inventors: Liang-Kai Wang, Terence M. Potter, Brian K. Reynolds, Justin Friesenhahn
  • Patent number: 10353711
    Abstract: Techniques are disclosed relating to clause-based execution of program instructions, which may be single-instruction multiple data (SIMD) computer instructions. In some embodiments, an apparatus includes execution circuitry configured to receive clauses of instructions and SIMD groups of input data to be operated on by the clauses. In some embodiments, the apparatus further includes one or more storage elements configured to store state information for clauses processed by the execution circuitry. In some embodiments, the apparatus further includes scheduling circuitry configured to send instructions of a first clause and corresponding input data for execution by the execution circuitry and indicate, prior to sending instruction and input data of a second clause to the execution circuitry for execution, whether the second clause and a first clause are assigned to operate on groups of input data corresponding to the same instruction stream.
    Type: Grant
    Filed: September 6, 2016
    Date of Patent: July 16, 2019
    Assignee: Apple Inc.
    Inventors: Andrew M. Havlir, Brian K. Reynolds, Liang Xia, Terence M. Potter
  • Patent number: 10324726
    Abstract: Techniques are disclosed relating to scheduling graphics instructions for execution on different types of execution units based on characteristics of decoded and cached graphics instruction. In some embodiments, a graphics unit includes multiple different types of execution units that are configured to execute different types of instructions (e.g., different units for datapath, sample, load/store, etc.). In some embodiments, the graphics unit stores decoded instructions in an instruction cache in at least one cache level, along with information specifying characteristics of the instructions. The characteristics may be stored at clause granularity and may indicate the type of instructions in each clause (e.g., corresponding to which type of execution unit is configured to execute the instructions).
    Type: Grant
    Filed: February 10, 2017
    Date of Patent: June 18, 2019
    Assignee: Apple Inc.
    Inventors: Michael A. Geary, Brian K. Reynolds, Terence M. Potter
  • Patent number: 10324844
    Abstract: Techniques are disclosed relating to memory consistency in a memory hierarchy with relaxed ordering. In some embodiments, an apparatus includes a first level cache that is shared by a plurality of shader processing elements and a second level cache that is shared by the shader processing elements and at least a texture processing unit. In some embodiments, the apparatus is configured to execute operations specified by graphics instructions that include (1) an attribute of the operation that specifies a type of memory consistency to be imposed for the operation and (2) scope information for the attribute that specifies whether the memory consistency specified by the attribute should be enforced at the first level cache or the second level cache. In some embodiments, the apparatus is configured to determine whether to sequence memory accesses at the first level cache and the second level cache based on the attribute and the scope.
    Type: Grant
    Filed: December 22, 2016
    Date of Patent: June 18, 2019
    Assignee: Apple Inc.
    Inventors: Anthony P. DeLaurier, Owen C. Anderson, Michael J. Swift, Aaftab A. Munshi, Terence M. Potter
  • Patent number: 10282169
    Abstract: Techniques are disclosed relating to floating-point operations with down-conversion. In some embodiments, a floating-point unit is configured to perform fused multiply-addition operations based on first and second different instruction types. In some embodiments, the first instruction type specifies result in the first floating-point format and the second instruction type specifies fused multiply addition of input operands in the first floating-point format to generate a result in a second, lower-precision floating-point format. For example, the first format may be a 32-bit format and the second format may be a 16-bit format. In some embodiments, the floating-point unit includes rounding circuitry, exponent circuitry, and/or increment circuitry configured to generate signals for the second instruction type in the same pipeline stage as for the first instruction type. In some embodiments, disclosed techniques may reduce the number of pipeline stages included in the floating-point circuitry.
    Type: Grant
    Filed: April 6, 2016
    Date of Patent: May 7, 2019
    Assignee: Apple Inc.
    Inventors: Liang-Kai Wang, Terence M. Potter, Andrew M. Havlir, Yu Sun, Nicolas X. Pena, Xiao-Long Wu, Christopher A. Burns
  • Patent number: 10223822
    Abstract: Techniques are disclosed relating to performing mid-render auxiliary compute tasks for graphics processing. In some embodiments, auxiliary compute tasks are performed during a render pass, using at least a portion of a memory context of the render pass, without accessing a shared memory during the render pass. Relative to flushing render data to shared memory to perform compute tasks, this may reduce memory accesses and/or cache thrashing, which may in turn increase performance and/or reduce power consumption.
    Type: Grant
    Filed: December 22, 2016
    Date of Patent: March 5, 2019
    Assignee: Apple Inc.
    Inventors: Terence M. Potter, Ralph C. Taylor, Richard W. Schreyer, Aaftab A. Munshi, Justin A. Hensley
  • Publication number: 20190042312
    Abstract: In various embodiments, a resource allocation management circuit may allocate a plurality of different types of hardware resources (e.g., different types of registers) to a plurality of threads. The different types of hardware resources may correspond to a plurality of hardware resource allocation circuits. The resource allocation management circuit may track allocation of the hardware resources to the threads using state identification values of the threads. In response to determining that fewer than a respective requested number of one or more types of the hardware resources are available, the resource allocation management circuit may identify one or more threads for deallocation. As a result, the hardware resource allocation system may allocate hardware resources to threads more efficiently (e.g.
    Type: Application
    Filed: August 4, 2017
    Publication date: February 7, 2019
    Inventors: Mark D. Earl, Dimitri Tan, Christopher L. Spencer, Jeffrey T. Brady, Ralph C. Taylor, Terence M. Potter
  • Publication number: 20190034166
    Abstract: Techniques are disclosed relating to performing arithmetic operations to generate values for different related threads. In some embodiments, the threads are graphics threads and the values are operand locations. In some embodiments, an apparatus includes circuitry configured to generate results for multiple threads by performing a plurality of arithmetic operations indicated by an instruction. In some embodiments, the instruction specifies: an input value that is common to the multiple threads and, for at least one of the multiple threads, a type value that indicates whether to generate a result for the thread by performing an arithmetic operation based on a first input that is a result of an arithmetic operation from another thread of the multiple threads or to generate a result for the thread using the input value that is common to the multiple threads.
    Type: Application
    Filed: September 28, 2018
    Publication date: January 31, 2019
    Inventors: Liang-Kai Wang, Terence M. Potter, Brian K. Reynolds, Justin Friesenhahn
  • Publication number: 20180349146
    Abstract: In general, techniques are disclosed for tracking and allocating graphics processor hardware over specified periods of time. More particularly, hardware sensors may be used to determine the utilization of graphics processor hardware after each of a number of specified intervals (referred to as “sample intervals”). The utilization values so captured may be combined after a first number of sample intervals (the combined interval referred to as an “epoch interval”) and used to determine a normalized utilization of the graphic processor's hardware resources. Normalized epoch utilization values have been adjusted to account for resources used by concurrently executing processes. In some embodiments, a lower priority process that obtains and fails to release resources that should be allocated to one or more higher priority processes may be detected, paused, and its hardware resources given to the higher priority processes.
    Type: Application
    Filed: June 6, 2017
    Publication date: December 6, 2018
    Inventors: Tatsuya Iwamoto, Kutty Banerjee, Benjiman L. Goodman, Terence M. Potter
  • Patent number: 10089077
    Abstract: Techniques are disclosed relating to performing arithmetic operations to generate values for different related threads. In some embodiments, the threads are graphics threads and the values are operand locations. In some embodiments, an apparatus performs an arithmetic operation using first circuitry, on type value inputs for different threads that are encoded to represent values to be operated on by the first circuitry. In some embodiments, second arithmetic circuitry is configured to perform an arithmetic operation on an output of the first circuitry and an input (e.g., address information such as a base and an offset) that is common to the different threads and has a greater number of bits than the output of the first circuitry. In various embodiments, disclosed techniques may allow decoding of encoded values for different threads (which may reduce memory requirements relative to non-encoded values) with a shorter critical path and lower power consumption, e.g., relative to sequential decoding.
    Type: Grant
    Filed: January 10, 2017
    Date of Patent: October 2, 2018
    Assignee: Apple Inc.
    Inventors: Liang-Kai Wang, Terence M. Potter, Brian K. Reynolds, Justin Friesenhahn
  • Publication number: 20180182058
    Abstract: Techniques are disclosed relating to a hardware-supported flexible data structure for graphics processing. In some embodiments, dimensions of the data structure are configurable in an X direction, a Y direction, a number of samples per pixel, and an amount of data per sample. In some embodiments, these attributes are configurable using hardware registers. In some embodiments, the data structure is persistent across a tile being processed such that local memory context is accessible to both rendering threads of a render pass and mid-render compute threads.
    Type: Application
    Filed: December 22, 2016
    Publication date: June 28, 2018
    Inventors: Terence M. Potter, Robert Kenney, Aaftab A. Munshi, Justin A. Hensley, Richard W. Schreyer
  • Publication number: 20180181489
    Abstract: Techniques are disclosed relating to memory consistency in a memory hierarchy with relaxed ordering. In some embodiments, an apparatus includes a first level cache that is shared by a plurality of shader processing elements and a second level cache that is shared by the shader processing elements and at least a texture processing unit. In some embodiments, the apparatus is configured to execute operations specified by graphics instructions that include (1) an attribute of the operation that specifies a type of memory consistency to be imposed for the operation and (2) scope information for the attribute that specifies whether the memory consistency specified by the attribute should be enforced at the first level cache or the second level cache. In some embodiments, the apparatus is configured to determine whether to sequence memory accesses at the first level cache and the second level cache based on the attribute and the scope.
    Type: Application
    Filed: December 22, 2016
    Publication date: June 28, 2018
    Inventors: Anthony P. DeLaurier, Owen C. Anderson, Michael J. Swift, Aaftab A. Munshi, Terence M. Potter
  • Publication number: 20180182154
    Abstract: Techniques are disclosed relating to synchronizing access to pixel resources. Examples of pixel resources include color attachments, a stencil buffer, and a depth buffer. In some embodiments, hardware registers are used to track status of assigned pixel resources and pixel wait and pixel release instruction are used to synchronize access to the pixel resources. In some embodiments, other accesses to the pixel resources may occur out of program order. Relative to tracking and ordering pass groups, this weak ordering and explicit synchronization may improve performance and reduce power consumption. Disclosed techniques may also facilitate coordination between fragment rendering threads and auxiliary mid-render compute tasks.
    Type: Application
    Filed: December 22, 2016
    Publication date: June 28, 2018
    Inventors: Terence M. Potter, Richard W. Schreyer, James J. Ding, Alexander K. Kan, Michael Imbrogno
  • Publication number: 20180182153
    Abstract: Techniques are disclosed relating to performing mid-render auxiliary compute tasks for graphics processing. In some embodiments, auxiliary compute tasks are performed during a render pass, using at least a portion of a memory context of the render pass, without accessing a shared memory during the render pass. Relative to flushing render data to shared memory to perform compute tasks, this may reduce memory accesses and/or cache thrashing, which may in turn increase performance and/or reduce power consumption.
    Type: Application
    Filed: December 22, 2016
    Publication date: June 28, 2018
    Inventors: Terence M. Potter, Ralph C. Taylor, Richard W. Schreyer, Aaftab A. Munshi, Justin A. Hensley
  • Publication number: 20180173560
    Abstract: In various embodiments, hardware resources of a processing circuit may be allocated to a plurality of processes based on priorities of the processes. A hardware resource utilization sensor may detect a current utilization of the hardware resources by a process. A utilization accumulation circuit may determine a utilization of the hardware resources by the process over a particular amount of time. A target utilization of the hardware resources for the process may be determined based on the utilization of the hardware resources over the particular amount of time. A comparator circuit may compare the current utilization to the target utilization. A process priority adjustment circuit may adjust a priority of the process based on the comparison. Based on the adjusted priority, a different amount of hardware resources may be allocated to the processes.
    Type: Application
    Filed: December 21, 2016
    Publication date: June 21, 2018
    Inventors: Gokhan Avkarogullari, Terence M. Potter, Benjiman L. Goodman, Ralph C. Taylor, Kutty Banerjee
  • Publication number: 20180089090
    Abstract: In some embodiments, a system includes an execution unit, a register file, an operand cache, and a predication control circuit. Operands identified by an instruction may be stored in the operand cache. One or more entries of the operand cache that store the operands may be marked as dirty. The predication control circuit may identify an instruction as having an unresolved predication state. Subsequent to initiating execution of the instruction, the predication control circuit may receive results of the at least one unresolved conditional instruction. In response to the results indicating the instruction has a known-to-execute predication state, the predication control circuit may initiate writing, in the operand cache, results of executing the instruction. In response to the results indicating the instruction has a known-not-to-execute predication state, the predication control circuit may prevent the results from executing the instruction from being written in the operand cache.
    Type: Application
    Filed: September 23, 2016
    Publication date: March 29, 2018
    Inventors: Andrew M. Havlir, Terence M. Potter
  • Publication number: 20180067748
    Abstract: Techniques are disclosed relating to clause-based execution of program instructions, which may be single-instruction multiple data (SIMD) computer instructions. In some embodiments, an apparatus includes execution circuitry configured to receive clauses of instructions and SIMD groups of input data to be operated on by the clauses. In some embodiments, the apparatus further includes one or more storage elements configured to store state information for clauses processed by the execution circuitry. In some embodiments, the apparatus further includes scheduling circuitry configured to send instructions of a first clause and corresponding input data for execution by the execution circuitry and indicate, prior to sending instruction and input data of a second clause to the execution circuitry for execution, whether the second clause and a first clause are assigned to operate on groups of input data corresponding to the same instruction stream.
    Type: Application
    Filed: September 6, 2016
    Publication date: March 8, 2018
    Inventors: Andrew M. Havlir, Brian K. Reynolds, Liang Xia, Terence M. Potter
  • Patent number: 9846579
    Abstract: Techniques are disclosed relating to comparison circuitry. In some embodiments, compare circuitry is configured to generate comparison results for sets of inputs in both one or more integer formats and one or more floating-point formats. In some embodiments, the compare circuitry includes padding circuitry configured to add one or more bits to each of first and second input values to generate first and second padded values. In some embodiments, the compare circuitry also includes integer subtraction circuitry configured to subtract the first padded value from the second padded value to generate a subtraction result. In some embodiments, the compare circuitry includes output logic configured to generate the comparison result based on the subtraction result. In various embodiments, using at least a portion of the same circuitry (e.g., the subtractor) for both integer and floating-point comparisons may reduce processor area.
    Type: Grant
    Filed: June 13, 2016
    Date of Patent: December 19, 2017
    Assignee: Apple Inc.
    Inventors: Liang-Kai Wang, Terence M. Potter, Andrew M. Havlir
  • Publication number: 20170357506
    Abstract: Techniques are disclosed relating to comparison circuitry. In some embodiments, compare circuitry is configured to generate comparison results for sets of inputs in both one or more integer formats and one or more floating-point formats. In some embodiments, the compare circuitry includes padding circuitry configured to add one or more bits to each of first and second input values to generate first and second padded values. In some embodiments, the compare circuitry also includes integer subtraction circuitry configured to subtract the first padded value from the second padded value to generate a subtraction result. In some embodiments, the compare circuitry includes output logic configured to generate the comparison result based on the subtraction result. In various embodiments, using at least a portion of the same circuitry (e.g., the subtractor) for both integer and floating-point comparisons may reduce processor area.
    Type: Application
    Filed: June 13, 2016
    Publication date: December 14, 2017
    Inventors: Liang-Kai Wang, Terence M. Potter, Andrew M. Havlir