Patents by Inventor Terence M. Potter

Terence M. Potter has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20180089090
    Abstract: In some embodiments, a system includes an execution unit, a register file, an operand cache, and a predication control circuit. Operands identified by an instruction may be stored in the operand cache. One or more entries of the operand cache that store the operands may be marked as dirty. The predication control circuit may identify an instruction as having an unresolved predication state. Subsequent to initiating execution of the instruction, the predication control circuit may receive results of the at least one unresolved conditional instruction. In response to the results indicating the instruction has a known-to-execute predication state, the predication control circuit may initiate writing, in the operand cache, results of executing the instruction. In response to the results indicating the instruction has a known-not-to-execute predication state, the predication control circuit may prevent the results from executing the instruction from being written in the operand cache.
    Type: Application
    Filed: September 23, 2016
    Publication date: March 29, 2018
    Inventors: Andrew M. Havlir, Terence M. Potter
  • Publication number: 20180067748
    Abstract: Techniques are disclosed relating to clause-based execution of program instructions, which may be single-instruction multiple data (SIMD) computer instructions. In some embodiments, an apparatus includes execution circuitry configured to receive clauses of instructions and SIMD groups of input data to be operated on by the clauses. In some embodiments, the apparatus further includes one or more storage elements configured to store state information for clauses processed by the execution circuitry. In some embodiments, the apparatus further includes scheduling circuitry configured to send instructions of a first clause and corresponding input data for execution by the execution circuitry and indicate, prior to sending instruction and input data of a second clause to the execution circuitry for execution, whether the second clause and a first clause are assigned to operate on groups of input data corresponding to the same instruction stream.
    Type: Application
    Filed: September 6, 2016
    Publication date: March 8, 2018
    Inventors: Andrew M. Havlir, Brian K. Reynolds, Liang Xia, Terence M. Potter
  • Patent number: 9846579
    Abstract: Techniques are disclosed relating to comparison circuitry. In some embodiments, compare circuitry is configured to generate comparison results for sets of inputs in both one or more integer formats and one or more floating-point formats. In some embodiments, the compare circuitry includes padding circuitry configured to add one or more bits to each of first and second input values to generate first and second padded values. In some embodiments, the compare circuitry also includes integer subtraction circuitry configured to subtract the first padded value from the second padded value to generate a subtraction result. In some embodiments, the compare circuitry includes output logic configured to generate the comparison result based on the subtraction result. In various embodiments, using at least a portion of the same circuitry (e.g., the subtractor) for both integer and floating-point comparisons may reduce processor area.
    Type: Grant
    Filed: June 13, 2016
    Date of Patent: December 19, 2017
    Assignee: Apple Inc.
    Inventors: Liang-Kai Wang, Terence M. Potter, Andrew M. Havlir
  • Publication number: 20170357506
    Abstract: Techniques are disclosed relating to comparison circuitry. In some embodiments, compare circuitry is configured to generate comparison results for sets of inputs in both one or more integer formats and one or more floating-point formats. In some embodiments, the compare circuitry includes padding circuitry configured to add one or more bits to each of first and second input values to generate first and second padded values. In some embodiments, the compare circuitry also includes integer subtraction circuitry configured to subtract the first padded value from the second padded value to generate a subtraction result. In some embodiments, the compare circuitry includes output logic configured to generate the comparison result based on the subtraction result. In various embodiments, using at least a portion of the same circuitry (e.g., the subtractor) for both integer and floating-point comparisons may reduce processor area.
    Type: Application
    Filed: June 13, 2016
    Publication date: December 14, 2017
    Inventors: Liang-Kai Wang, Terence M. Potter, Andrew M. Havlir
  • Publication number: 20170293470
    Abstract: Techniques are disclosed relating to floating-point operations with down-conversion. In some embodiments, a floating-point unit is configured to perform fused multiply-addition operations based on first and second different instruction types. In some embodiments, the first instruction type specifies result in the first floating-point format and the second instruction type specifies fused multiply addition of input operands in the first floating-point format to generate a result in a second, lower-precision floating-point format. For example, the first format may be a 32-bit format and the second format may be a 16-bit format. In some embodiments, the floating-point unit includes rounding circuitry, exponent circuitry, and/or increment circuitry configured to generate signals for the second instruction type in the same pipeline stage as for the first instruction type. In some embodiments, disclosed techniques may reduce the number of pipeline stages included in the floating-point circuitry.
    Type: Application
    Filed: April 6, 2016
    Publication date: October 12, 2017
    Inventors: Liang-Kai Wang, Terence M. Potter, Andrew M. Havlir, Yu Sun, Nicolas X. Pena, Xiao-Long Wu, Christopher A. Burns
  • Patent number: 9785567
    Abstract: Techniques are disclosed relating to per-pipeline control for an operand cache. In some embodiments, an apparatus includes a register file and multiple execution pipelines. In some embodiments, the apparatus also includes an operand cache that includes multiple entries that each include multiple portions that are each configured to store an operand for a corresponding execution pipeline. In some embodiments, the operand cache is configured, during operation of the apparatus, to store data in only a subset of the portions of an entry. In some embodiments, the apparatus is configured to store, for each entry in the operand cache, a per-entry validity value that indicates whether the entry is valid and per-portion state information that indicates whether data for each portion is valid and whether data for each portion is modified relative to data in a corresponding entry in the register file.
    Type: Grant
    Filed: September 11, 2015
    Date of Patent: October 10, 2017
    Assignee: Apple Inc.
    Inventors: Andrew M. Havlir, Terence M. Potter, Liang-Kai Wang
  • Patent number: 9652233
    Abstract: Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from the register file may be energy and/or time intensive An operand cache may be used to store a subset of operands, and may use less power and have quicker access times than the register file. Hint values may be used in some embodiments to suggest that a particular operand should be stored in the operand cache (so that is available for current or future use). In one embodiment, a hint value indicates that an operand should be cached whenever possible. Hint values may be determined by software, such as a compiler, in some embodiments. One or more criteria may be used to determine hint values, such as how soon in the future or how frequently an operand will be used again.
    Type: Grant
    Filed: August 20, 2013
    Date of Patent: May 16, 2017
    Assignee: Apple Inc.
    Inventors: Terence M. Potter, Timothy A. Olson, James S. Blomgren, Andrew M. Havlir, Michael Geary
  • Patent number: 9632785
    Abstract: Techniques are disclosed relating to specification of instruction operands. In some embodiments, this may involve assigning operands to source inputs. In one embodiment, an instruction includes one or more mapping values, each of which corresponds to a source of the instruction and each of which specifies a location value. In this embodiment, the instruction includes one or more location values that are each usable to identify an operand for the instruction. In this embodiment, a method may include accessing operands using the location values and assigning accessed operands to sources using the mapping values. In one embodiment, the sources may correspond to inputs of an execution block. In one embodiment, a destination mapping value in the instruction may specify a location value that indicates a destination for storing an instruction result.
    Type: Grant
    Filed: August 10, 2016
    Date of Patent: April 25, 2017
    Assignee: Apple Inc.
    Inventors: James S. Blomgren, Terence M. Potter
  • Patent number: 9619394
    Abstract: An apparatus includes an operand cache for storing operands from a register file for use by execution circuitry. In some embodiments, eviction priority for the operand cache is based on the status of entries (e.g., whether dirty or clean) and the retention priority of entries. In some embodiments, flushes are handled differently based on their retention priority (e.g., low-priority entries may be pre-emptively flushed). In some embodiments, timing for cache clean operations is specified on a per-instruction basis. Disclosed techniques may spread out write backs in time, facilitate cache clean operations, facilitate thread switching, extend the time operands are available in an operand cache, and/or improve the use of compiler hints, in some embodiments.
    Type: Grant
    Filed: July 21, 2015
    Date of Patent: April 11, 2017
    Assignee: Apple Inc.
    Inventors: Andrew M. Havlir, Terence M. Potter
  • Patent number: 9600288
    Abstract: A system and method for efficiently accessing operands in a datapath. An apparatus includes a data operand register file and an execution pipeline with multiple stages. In addition, the apparatus includes a result bypass cache configured to store data results conveyed by at least the final stage of the execution pipeline stage. Control logic is included which is configured to determine whether source operands for an instruction entering the pipeline are available in the last stage of the pipeline or in the result bypass cache. If the source operands are available in the last stage of the pipeline or the result bypass cache, they may be obtained from one of those locations rather than reading from the register file. If the source operands are not available from the last stage or the result bypass cache, then they may be obtained from the data operand register file.
    Type: Grant
    Filed: May 7, 2012
    Date of Patent: March 21, 2017
    Assignee: Apple Inc.
    Inventors: Terence M. Potter, Timothy A. Olson, James S. Blomgren, Robert A. Drebin, Douglas C. Youngwith, Jon A. Loschke
  • Publication number: 20170075810
    Abstract: Techniques are disclosed relating to per-pipeline control for an operand cache. In some embodiments, an apparatus includes a register file and multiple execution pipelines. In some embodiments, the apparatus also includes an operand cache that includes multiple entries that each include multiple portions that are each configured to store an operand for a corresponding execution pipeline. In some embodiments, the operand cache is configured, during operation of the apparatus, to store data in only a subset of the portions of an entry. In some embodiments, the apparatus is configured to store, for each entry in the operand cache, a per-entry validity value that indicates whether the entry is valid and per-portion state information that indicates whether data for each portion is valid and whether data for each portion is modified relative to data in a corresponding entry in the register file.
    Type: Application
    Filed: September 11, 2015
    Publication date: March 16, 2017
    Inventors: Andrew M. Havlir, Terence M. Potter, Liang-Kai Wang
  • Patent number: 9594395
    Abstract: Techniques are disclosed relating to clock routing techniques in processors with both pipelined and non-pipelined circuitry. In some embodiments, an apparatus includes execution units that are non-pipelined and configured to perform instructions without receiving a clock signal. In these embodiments, one or more clock lines routed throughout the apparatus do not extend into the one or more execution units in each pipeline, reducing the length of the clock lines. In some embodiments, the apparatus includes multiple such pipelines arranged in an array, with the execution units located on an outer portion of the array and clocked control circuitry located on an inner portion of the array. In some embodiments, clock lines do not extend into the outer portion of the array. In some embodiments, the array includes one or more rows of execution units. These arrangements may further reduce the length of clock lines.
    Type: Grant
    Filed: January 21, 2014
    Date of Patent: March 14, 2017
    Assignee: Apple Inc.
    Inventors: Andrew M. Havlir, James S. Blomgren, Terence M. Potter
  • Publication number: 20170024323
    Abstract: An apparatus includes an operand cache for storing operands from a register file for use by execution circuitry. In some embodiments, eviction priority for the operand cache is based on the status of entries (e.g., whether dirty or clean) and the retention priority of entries. In some embodiments, flushes are handled differently based on their retention priority (e.g., low-priority entries may be pre-emptively flushed). In some embodiments, timing for cache clean operations is specified on a per-instruction basis.
    Type: Application
    Filed: July 21, 2015
    Publication date: January 26, 2017
    Inventors: Andrew M. Havlir, Terence M. Potter
  • Publication number: 20160350113
    Abstract: Techniques are disclosed relating to specification of instruction operands. In some embodiments, this may involve assigning operands to source inputs. In one embodiment, an instruction includes one or more mapping values, each of which corresponds to a source of the instruction and each of which specifies a location value. In this embodiment, the instruction includes one or more location values that are each usable to identify an operand for the instruction. In this embodiment, a method may include accessing operands using the location values and assigning accessed operands to sources using the mapping values. In one embodiment, the sources may correspond to inputs of an execution block. In one embodiment, a destination mapping value in the instruction may specify a location value that indicates a destination for storing an instruction result.
    Type: Application
    Filed: August 10, 2016
    Publication date: December 1, 2016
    Inventors: James S. Blomgren, Terence M. Potter
  • Patent number: 9508112
    Abstract: Techniques are disclosed relating to a multithreaded execution pipeline. In some embodiments, an apparatus is configured to assign a number of threads to an execution pipeline that is an integer multiple of a minimum number of cycles that an execution unit is configured to use to generate an execution result from a given set of input operands. In one embodiment, the apparatus is configured to require strict ordering of the threads. In one embodiment, the apparatus is configured so that the same thread access (e.g., reads and writes) a register file in a given cycle. In one embodiment, the apparatus is configured so that the same thread does not write back an operand and a result to an operand cache in a given cycle.
    Type: Grant
    Filed: July 31, 2013
    Date of Patent: November 29, 2016
    Assignee: Apple Inc.
    Inventors: Andrew M. Havlir, James S. Blomgren, Terence M. Potter
  • Patent number: 9459869
    Abstract: Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from the register file may be energy and/or time intensive An operand cache may store a subset of operands, and may use less power and have quicker access times than the register file. In some embodiments, intelligent operand prefetching may speed execution by reducing memory bank conflicts (e.g., conflicts within a register file containing multiple memory banks). An unused operand slot for another instruction (e.g., an instruction that does not require a maximum number of source operands allowed by an instruction set architecture) may be used to prefetch an operand for another instruction in one embodiment. Prefetched operands may be stored in an operand cache, and prefetching may occur based on software-provided information.
    Type: Grant
    Filed: August 20, 2013
    Date of Patent: October 4, 2016
    Assignee: Apple Inc.
    Inventors: Timothy A. Olson, Terence M. Potter, James S. Blomgren, Andrew M. Havlir
  • Patent number: 9442730
    Abstract: Techniques are disclosed relating to specification of instruction operands. In some embodiments, this may involve assigning operands to source inputs. In one embodiment, an instruction includes one or more mapping values, each of which corresponds to a source of the instruction and each of which specifies a location value. In this embodiment, the instruction includes one or more location values that are each usable to identify an operand for the instruction. In this embodiment, a method may include accessing operands using the location values and assigning accessed operands to sources using the mapping values. In one embodiment, the sources may correspond to inputs of an execution block. In one embodiment, a destination mapping value in the instruction may specify a location value that indicates a destination for storing an instruction result.
    Type: Grant
    Filed: July 31, 2013
    Date of Patent: September 13, 2016
    Assignee: Apple Inc.
    Inventors: James S. Blomgren, Terence M. Potter
  • Patent number: 9417843
    Abstract: Techniques are disclosed relating to performing extended multiplies without a carry flag. In one embodiment, an apparatus includes a multiply unit configured to perform multiplications of operands having a particular width. In this embodiment, the apparatus also includes multiple storage elements configured to store operands for the multiply unit. In this embodiment, each of the storage elements is configured to provide a portion of a stored operand that is less than an entirety of the stored operand in response to a control signal from the apparatus. In one embodiment, the apparatus is configured to perform a multiplication of given first and second operands having a width greater than the particular width by performing a sequence of multiply operations using the multiply unit, using portions of the stored operands and without using a carry flag between any of the sequence of multiply operations.
    Type: Grant
    Filed: August 20, 2013
    Date of Patent: August 16, 2016
    Assignee: Apple Inc.
    Inventors: James S. Blomgren, Terence M. Potter
  • Patent number: 9378146
    Abstract: Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from a register file may be energy and/or time intensive An operand cache may be used to store a subset of operands, and may use less power and have quicker access times than the register file. Selectors (e.g., multiplexers) may be used to read operands from the operand cache. Power savings may be achieved in some embodiments by activating only a subset of the selectors, which may be done by activators (e.g. flip-flops). Operands may also be concurrently provided to two or more locations via forwarding, which may be accomplished via a source selection unit in some embodiments. Operand forwarding may also reduce power and/or speed execution in one or more embodiments.
    Type: Grant
    Filed: August 20, 2013
    Date of Patent: June 28, 2016
    Assignee: Apple Inc.
    Inventors: James S. Blomgren, Terence M. Potter, Timothy A. Olson, Andrew M. Havlir
  • Publication number: 20160093014
    Abstract: A data queuing and format apparatus is disclosed. A first selection circuit may be configured to selectively couple a first subset of data to a first plurality of data lines dependent upon control information, and a second selection circuit may be configured to selectively couple a second subset of data to a second plurality of data lines dependent upon the control information. A storage array may include multiple storage units, and each storage unit may be configured to receive data from one or more data lines of either the first or second plurality of data lines dependent upon the control information.
    Type: Application
    Filed: September 25, 2014
    Publication date: March 31, 2016
    Inventors: Liang Xia, Robert D. Kenney, Benjiman L. Goodman, Terence M. Potter