Patents by Inventor Terence M. Potter
Terence M. Potter has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20200104180Abstract: In general, embodiments are disclosed for tracking and allocating graphics processor hardware resources. More particularly, a graphics hardware resource allocation system is able to generate a priority list for a plurality of data masters for graphics processor based on a comparison between a current utilizations for the data masters and a target utilizations for the data masters. The graphics hardware resource allocation system designate, based on the priority list, a first data master with a higher priority to submit work to the graphics processor compared to a second data master. The graphics hardware resource allocation system determines a stall counter value for the data master and generates a notification to pause work for the second data master based on the stall counter value.Type: ApplicationFiled: September 28, 2018Publication date: April 2, 2020Inventors: Kutty Banerjee, Benjamin Bowman, Terence M. Potter, Tatsuya Iwamoto, Gokhan Avkarogullari
-
Publication number: 20200065104Abstract: Techniques are disclosed relating to controlling an operand cache in a pipelined fashion. An operand cache may cache operands fetched from the register file or generated by previous instructions to improve performance and/or reduce power consumption. In some embodiments, instructions are pipelined and separate tag information is maintained to indicate allocation of an operand cache entry and ownership of the operand cache entry. In some embodiments, this may allow an operand to remain in the operand cache (and potentially be retrieved or modified) during an interval between allocation of the entry for another operand and ownership of the entry by the other operand. This may improve operand cache efficiency by allowing the entry to be used while to retrieving the other operand from the register file, for example.Type: ApplicationFiled: August 24, 2018Publication date: February 27, 2020Inventors: Robert D. Kenney, Terence M. Potter, Andrew M. Havlir, Sivayya V. Ayinala
-
Patent number: 10503546Abstract: In general, techniques are disclosed for tracking and allocating graphics processor hardware over specified periods of time. More particularly, hardware sensors may be used to determine the utilization of graphics processor hardware after each of a number of specified intervals (referred to as “sample intervals”). The utilization values so captured may be combined after a first number of sample intervals (the combined interval referred to as an “epoch interval”) and used to determine a normalized utilization of the graphic processor's hardware resources. Normalized epoch utilization values have been adjusted to account for resources used by concurrently executing processes. In some embodiments, a lower priority process that obtains and fails to release resources that should be allocated to one or more higher priority processes may be detected, paused, and its hardware resources given to the higher priority processes.Type: GrantFiled: June 6, 2017Date of Patent: December 10, 2019Assignee: Apple Inc.Inventors: Tatsuya Iwamoto, Kutty Banerjee, Benjiman L. Goodman, Terence M. Potter
-
Patent number: 10504270Abstract: Techniques are disclosed relating to synchronizing access to pixel resources. Examples of pixel resources include color attachments, a stencil buffer, and a depth buffer. In some embodiments, hardware registers are used to track status of assigned pixel resources and pixel wait and pixel release instruction are used to synchronize access to the pixel resources. In some embodiments, other accesses to the pixel resources may occur out of program order. Relative to tracking and ordering pass groups, this weak ordering and explicit synchronization may improve performance and reduce power consumption. Disclosed techniques may also facilitate coordination between fragment rendering threads and auxiliary mid-render compute tasks.Type: GrantFiled: December 22, 2016Date of Patent: December 10, 2019Assignee: Apple Inc.Inventors: Terence M. Potter, Richard W. Schreyer, James J. Ding, Alexander K. Kan, Michael Imbrogno
-
Patent number: 10445852Abstract: Techniques are disclosed relating to a hardware-supported flexible data structure for graphics processing. In some embodiments, dimensions of the data structure are configurable in an X direction, a Y direction, a number of samples per pixel, and an amount of data per sample. In some embodiments, these attributes are configurable using hardware registers. In some embodiments, the data structure is persistent across a tile being processed such that local memory context is accessible to both rendering threads of a render pass and mid-render compute threads.Type: GrantFiled: December 22, 2016Date of Patent: October 15, 2019Assignee: Apple Inc.Inventors: Terence M. Potter, Robert Kenney, Aaftab A. Munshi, Justin A. Hensley, Richard W. Schreyer
-
Patent number: 10387119Abstract: Techniques are disclosed relating to performing arithmetic operations to generate values for different related threads. In some embodiments, the threads are graphics threads and the values are operand locations. In some embodiments, an apparatus includes circuitry configured to generate results for multiple threads by performing a plurality of arithmetic operations indicated by an instruction. In some embodiments, the instruction specifies: an input value that is common to the multiple threads and, for at least one of the multiple threads, a type value that indicates whether to generate a result for the thread by performing an arithmetic operation based on a first input that is a result of an arithmetic operation from another thread of the multiple threads or to generate a result for the thread using the input value that is common to the multiple threads.Type: GrantFiled: September 28, 2018Date of Patent: August 20, 2019Assignee: Apple Inc.Inventors: Liang-Kai Wang, Terence M. Potter, Brian K. Reynolds, Justin Friesenhahn
-
Patent number: 10353711Abstract: Techniques are disclosed relating to clause-based execution of program instructions, which may be single-instruction multiple data (SIMD) computer instructions. In some embodiments, an apparatus includes execution circuitry configured to receive clauses of instructions and SIMD groups of input data to be operated on by the clauses. In some embodiments, the apparatus further includes one or more storage elements configured to store state information for clauses processed by the execution circuitry. In some embodiments, the apparatus further includes scheduling circuitry configured to send instructions of a first clause and corresponding input data for execution by the execution circuitry and indicate, prior to sending instruction and input data of a second clause to the execution circuitry for execution, whether the second clause and a first clause are assigned to operate on groups of input data corresponding to the same instruction stream.Type: GrantFiled: September 6, 2016Date of Patent: July 16, 2019Assignee: Apple Inc.Inventors: Andrew M. Havlir, Brian K. Reynolds, Liang Xia, Terence M. Potter
-
Providing instruction characteristics to graphics scheduling circuitry based on decoded instructions
Patent number: 10324726Abstract: Techniques are disclosed relating to scheduling graphics instructions for execution on different types of execution units based on characteristics of decoded and cached graphics instruction. In some embodiments, a graphics unit includes multiple different types of execution units that are configured to execute different types of instructions (e.g., different units for datapath, sample, load/store, etc.). In some embodiments, the graphics unit stores decoded instructions in an instruction cache in at least one cache level, along with information specifying characteristics of the instructions. The characteristics may be stored at clause granularity and may indicate the type of instructions in each clause (e.g., corresponding to which type of execution unit is configured to execute the instructions).Type: GrantFiled: February 10, 2017Date of Patent: June 18, 2019Assignee: Apple Inc.Inventors: Michael A. Geary, Brian K. Reynolds, Terence M. Potter -
Patent number: 10324844Abstract: Techniques are disclosed relating to memory consistency in a memory hierarchy with relaxed ordering. In some embodiments, an apparatus includes a first level cache that is shared by a plurality of shader processing elements and a second level cache that is shared by the shader processing elements and at least a texture processing unit. In some embodiments, the apparatus is configured to execute operations specified by graphics instructions that include (1) an attribute of the operation that specifies a type of memory consistency to be imposed for the operation and (2) scope information for the attribute that specifies whether the memory consistency specified by the attribute should be enforced at the first level cache or the second level cache. In some embodiments, the apparatus is configured to determine whether to sequence memory accesses at the first level cache and the second level cache based on the attribute and the scope.Type: GrantFiled: December 22, 2016Date of Patent: June 18, 2019Assignee: Apple Inc.Inventors: Anthony P. DeLaurier, Owen C. Anderson, Michael J. Swift, Aaftab A. Munshi, Terence M. Potter
-
Patent number: 10282169Abstract: Techniques are disclosed relating to floating-point operations with down-conversion. In some embodiments, a floating-point unit is configured to perform fused multiply-addition operations based on first and second different instruction types. In some embodiments, the first instruction type specifies result in the first floating-point format and the second instruction type specifies fused multiply addition of input operands in the first floating-point format to generate a result in a second, lower-precision floating-point format. For example, the first format may be a 32-bit format and the second format may be a 16-bit format. In some embodiments, the floating-point unit includes rounding circuitry, exponent circuitry, and/or increment circuitry configured to generate signals for the second instruction type in the same pipeline stage as for the first instruction type. In some embodiments, disclosed techniques may reduce the number of pipeline stages included in the floating-point circuitry.Type: GrantFiled: April 6, 2016Date of Patent: May 7, 2019Assignee: Apple Inc.Inventors: Liang-Kai Wang, Terence M. Potter, Andrew M. Havlir, Yu Sun, Nicolas X. Pena, Xiao-Long Wu, Christopher A. Burns
-
Patent number: 10223822Abstract: Techniques are disclosed relating to performing mid-render auxiliary compute tasks for graphics processing. In some embodiments, auxiliary compute tasks are performed during a render pass, using at least a portion of a memory context of the render pass, without accessing a shared memory during the render pass. Relative to flushing render data to shared memory to perform compute tasks, this may reduce memory accesses and/or cache thrashing, which may in turn increase performance and/or reduce power consumption.Type: GrantFiled: December 22, 2016Date of Patent: March 5, 2019Assignee: Apple Inc.Inventors: Terence M. Potter, Ralph C. Taylor, Richard W. Schreyer, Aaftab A. Munshi, Justin A. Hensley
-
Publication number: 20190042312Abstract: In various embodiments, a resource allocation management circuit may allocate a plurality of different types of hardware resources (e.g., different types of registers) to a plurality of threads. The different types of hardware resources may correspond to a plurality of hardware resource allocation circuits. The resource allocation management circuit may track allocation of the hardware resources to the threads using state identification values of the threads. In response to determining that fewer than a respective requested number of one or more types of the hardware resources are available, the resource allocation management circuit may identify one or more threads for deallocation. As a result, the hardware resource allocation system may allocate hardware resources to threads more efficiently (e.g.Type: ApplicationFiled: August 4, 2017Publication date: February 7, 2019Inventors: Mark D. Earl, Dimitri Tan, Christopher L. Spencer, Jeffrey T. Brady, Ralph C. Taylor, Terence M. Potter
-
Publication number: 20190034166Abstract: Techniques are disclosed relating to performing arithmetic operations to generate values for different related threads. In some embodiments, the threads are graphics threads and the values are operand locations. In some embodiments, an apparatus includes circuitry configured to generate results for multiple threads by performing a plurality of arithmetic operations indicated by an instruction. In some embodiments, the instruction specifies: an input value that is common to the multiple threads and, for at least one of the multiple threads, a type value that indicates whether to generate a result for the thread by performing an arithmetic operation based on a first input that is a result of an arithmetic operation from another thread of the multiple threads or to generate a result for the thread using the input value that is common to the multiple threads.Type: ApplicationFiled: September 28, 2018Publication date: January 31, 2019Inventors: Liang-Kai Wang, Terence M. Potter, Brian K. Reynolds, Justin Friesenhahn
-
Publication number: 20180349146Abstract: In general, techniques are disclosed for tracking and allocating graphics processor hardware over specified periods of time. More particularly, hardware sensors may be used to determine the utilization of graphics processor hardware after each of a number of specified intervals (referred to as “sample intervals”). The utilization values so captured may be combined after a first number of sample intervals (the combined interval referred to as an “epoch interval”) and used to determine a normalized utilization of the graphic processor's hardware resources. Normalized epoch utilization values have been adjusted to account for resources used by concurrently executing processes. In some embodiments, a lower priority process that obtains and fails to release resources that should be allocated to one or more higher priority processes may be detected, paused, and its hardware resources given to the higher priority processes.Type: ApplicationFiled: June 6, 2017Publication date: December 6, 2018Inventors: Tatsuya Iwamoto, Kutty Banerjee, Benjiman L. Goodman, Terence M. Potter
-
Patent number: 10089077Abstract: Techniques are disclosed relating to performing arithmetic operations to generate values for different related threads. In some embodiments, the threads are graphics threads and the values are operand locations. In some embodiments, an apparatus performs an arithmetic operation using first circuitry, on type value inputs for different threads that are encoded to represent values to be operated on by the first circuitry. In some embodiments, second arithmetic circuitry is configured to perform an arithmetic operation on an output of the first circuitry and an input (e.g., address information such as a base and an offset) that is common to the different threads and has a greater number of bits than the output of the first circuitry. In various embodiments, disclosed techniques may allow decoding of encoded values for different threads (which may reduce memory requirements relative to non-encoded values) with a shorter critical path and lower power consumption, e.g., relative to sequential decoding.Type: GrantFiled: January 10, 2017Date of Patent: October 2, 2018Assignee: Apple Inc.Inventors: Liang-Kai Wang, Terence M. Potter, Brian K. Reynolds, Justin Friesenhahn
-
Publication number: 20180182058Abstract: Techniques are disclosed relating to a hardware-supported flexible data structure for graphics processing. In some embodiments, dimensions of the data structure are configurable in an X direction, a Y direction, a number of samples per pixel, and an amount of data per sample. In some embodiments, these attributes are configurable using hardware registers. In some embodiments, the data structure is persistent across a tile being processed such that local memory context is accessible to both rendering threads of a render pass and mid-render compute threads.Type: ApplicationFiled: December 22, 2016Publication date: June 28, 2018Inventors: Terence M. Potter, Robert Kenney, Aaftab A. Munshi, Justin A. Hensley, Richard W. Schreyer
-
Publication number: 20180182154Abstract: Techniques are disclosed relating to synchronizing access to pixel resources. Examples of pixel resources include color attachments, a stencil buffer, and a depth buffer. In some embodiments, hardware registers are used to track status of assigned pixel resources and pixel wait and pixel release instruction are used to synchronize access to the pixel resources. In some embodiments, other accesses to the pixel resources may occur out of program order. Relative to tracking and ordering pass groups, this weak ordering and explicit synchronization may improve performance and reduce power consumption. Disclosed techniques may also facilitate coordination between fragment rendering threads and auxiliary mid-render compute tasks.Type: ApplicationFiled: December 22, 2016Publication date: June 28, 2018Inventors: Terence M. Potter, Richard W. Schreyer, James J. Ding, Alexander K. Kan, Michael Imbrogno
-
Publication number: 20180181489Abstract: Techniques are disclosed relating to memory consistency in a memory hierarchy with relaxed ordering. In some embodiments, an apparatus includes a first level cache that is shared by a plurality of shader processing elements and a second level cache that is shared by the shader processing elements and at least a texture processing unit. In some embodiments, the apparatus is configured to execute operations specified by graphics instructions that include (1) an attribute of the operation that specifies a type of memory consistency to be imposed for the operation and (2) scope information for the attribute that specifies whether the memory consistency specified by the attribute should be enforced at the first level cache or the second level cache. In some embodiments, the apparatus is configured to determine whether to sequence memory accesses at the first level cache and the second level cache based on the attribute and the scope.Type: ApplicationFiled: December 22, 2016Publication date: June 28, 2018Inventors: Anthony P. DeLaurier, Owen C. Anderson, Michael J. Swift, Aaftab A. Munshi, Terence M. Potter
-
Publication number: 20180182153Abstract: Techniques are disclosed relating to performing mid-render auxiliary compute tasks for graphics processing. In some embodiments, auxiliary compute tasks are performed during a render pass, using at least a portion of a memory context of the render pass, without accessing a shared memory during the render pass. Relative to flushing render data to shared memory to perform compute tasks, this may reduce memory accesses and/or cache thrashing, which may in turn increase performance and/or reduce power consumption.Type: ApplicationFiled: December 22, 2016Publication date: June 28, 2018Inventors: Terence M. Potter, Ralph C. Taylor, Richard W. Schreyer, Aaftab A. Munshi, Justin A. Hensley
-
Publication number: 20180173560Abstract: In various embodiments, hardware resources of a processing circuit may be allocated to a plurality of processes based on priorities of the processes. A hardware resource utilization sensor may detect a current utilization of the hardware resources by a process. A utilization accumulation circuit may determine a utilization of the hardware resources by the process over a particular amount of time. A target utilization of the hardware resources for the process may be determined based on the utilization of the hardware resources over the particular amount of time. A comparator circuit may compare the current utilization to the target utilization. A process priority adjustment circuit may adjust a priority of the process based on the comparison. Based on the adjusted priority, a different amount of hardware resources may be allocated to the processes.Type: ApplicationFiled: December 21, 2016Publication date: June 21, 2018Inventors: Gokhan Avkarogullari, Terence M. Potter, Benjiman L. Goodman, Ralph C. Taylor, Kutty Banerjee